Ancient DNA: Methods and Protocols [2nd ed.] 978-1-4939-9175-4, 978-1-4939-9176-1

This fully updated second edition explores protocols that address the most challenging aspects of experimental work in a

432 126 4MB

English Pages X, 216 [214] Year 2019

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Ancient DNA: Methods and Protocols [2nd ed.]
 978-1-4939-9175-4, 978-1-4939-9176-1

Table of contents :
Front Matter ....Pages i-x
Setting Up an Ancient DNA Laboratory (Tara L. Fulton, Beth Shapiro)....Pages 1-13
Pretreatment: Removing DNA Contamination from Ancient Bones and Teeth Using Sodium Hypochlorite and Phosphate (Petra Korlević, Matthias Meyer)....Pages 15-19
Pretreatment: Improving Endogenous Ancient DNA Yields Using a Simple Enzymatic Predigestion Step (Hannes Schroeder, Peter de Barros Damgaard, Morten E. Allentoft)....Pages 21-24
Extraction of Highly Degraded DNA from Ancient Bones and Teeth (Jesse Dabney, Matthias Meyer)....Pages 25-29
Sampling and Extraction of Ancient DNA from Sediments (Laura S. Epp, Heike H. Zimmermann, Kathleen R. Stoof-Leichsenring)....Pages 31-44
Extraction of Ancient DNA from Plant Remains (Nathan Wales, Logan Kistler)....Pages 45-55
DNA Extraction from Keratin and Chitin (Paula F. Campos, M. Thomas P. Gilbert)....Pages 57-63
Double-Stranded Library Preparation for Ancient and Other Degraded Samples (Kirstin Henneberger, Axel Barlow, Johanna L. A. Paijmans)....Pages 65-73
A Method for Single-Stranded Ancient DNA Library Preparation (Marie-Theres Gansauge, Matthias Meyer)....Pages 75-83
Sequencing Library Preparation from Degraded Samples for Non-illumina Sequencing Platforms (Renata F. Martins, Marie-Louise Kampmann, Daniel W. Förster)....Pages 85-92
Whole-Genome Capture of Ancient DNA Using Homemade Baits (Gloria González Fortes, Johanna L. A. Paijmans)....Pages 93-105
Generating RNA Baits for Capture-Based Enrichment (Noah Snyder-Mackler, Tawni Voyles, Jenny Tung)....Pages 107-120
Hybridization Capture of Ancient DNA Using RNA Baits (André E. R. Soares)....Pages 121-128
Application of Solid-State Capture for the Retrieval of Small-to-Medium Sized Target Loci from Ancient DNA (Johanna L. A. Paijmans, Gloria González Fortes, Daniel W. Förster)....Pages 129-139
Targeted PCR Amplification and Multiplex Sequencing of Ancient DNA for SNP Analysis (Saskia Wutke, Arne Ludwig)....Pages 141-147
Targeted Amplification and Sequencing of Ancient Environmental and Sedimentary DNA (Ruth V. Nichols, Emily Curd, Peter D. Heintzman, Beth Shapiro)....Pages 149-161
Authentication and Assessment of Contamination in Ancient DNA (Gabriel Renaud, Mikkel Schubert, Susanna Sawyer, Ludovic Orlando)....Pages 163-194
Assembly of Ancient Mitochondrial Genomes Without a Closely Related Reference Sequence (Christoph Hahn)....Pages 195-213
Back Matter ....Pages 215-216

Citation preview

Methods in Molecular Biology 1963

Beth Shapiro · Axel Barlow Peter D. Heintzman Michael Hofreiter Johanna L. A. Paijmans André E. R. Soares Editors

Ancient DNA

Methods and Protocols Second Edition

METHODS

IN

MOLECULAR BIOLOGY

Series Editor John M. Walker School of Life and Medical Sciences University of Hertfordshire Hatfield, Hertfordshire, AL10 9AB, UK

For further volumes: http://www.springer.com/series/7651

Ancient DNA Methods and Protocols Second Edition

Edited by

Beth Shapiro Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA, USA

Axel Barlow Institute for Biochemistry and Biology, University of Potsdam, Potsdam, Germany

Peter D. Heintzman Tromsø University Museum, UiT—The Arctic University of Norway, Tromsø, Norway

Michael Hofreiter, Johanna L. A. Paijmans Institute for Biochemistry and Biology, University of Potsdam, Potsdam, Germany

André E. R. Soares Laboratório Nacional de Computação Científica, Petrópolis, RJ, Brazil

Editors Beth Shapiro Department of Ecology and Evolutionary Biology University of California Santa Cruz Santa Cruz, CA, USA Peter D. Heintzman Tromsø University Museum UiT—The Arctic University of Norway Tromsø, Norway Johanna L. A. Paijmans Institute for Biochemistry and Biology University of Potsdam Potsdam, Germany

Axel Barlow Institute for Biochemistry and Biology University of Potsdam Potsdam, Germany Michael Hofreiter Institute for Biochemistry and Biology University of Potsdam Potsdam, Germany Andre´ E. R. Soares Laborato´rio Nacional de Computac¸˜ao Cientı´fica Petro´polis, RJ, Brazil

ISSN 1064-3745 ISSN 1940-6029 (electronic) Methods in Molecular Biology ISBN 978-1-4939-9175-4 ISBN 978-1-4939-9176-1 (eBook) https://doi.org/10.1007/978-1-4939-9176-1 Library of Congress Control Number: 2019933331 © Springer Science+Business Media, LLC, part of Springer Nature 2019 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Humana Press imprint is published by the registered company Springer Science+Business Media, LLC, part of Springer Nature. The registered company address is: 233 Spring Street, New York, NY 10013, U.S.A.

Preface Research in ancient DNA began during the early 1980s with the publication of short mitochondrial DNA sequence fragments from a quagga, an extinct subspecies of the plains zebra. Following this, several institutions invested resources to develop both the laboratory infrastructure and research expertise to recover trace amounts of degraded DNA from ancient samples. For many years, the field remained small, as researchers grappled with the challenges of inhibition, degradation, and contamination of ancient DNA extracts. Today’s broad application of ancient DNA as a research tool is owed to two technical innovations within the last 25 years: the invention of the polymerase chain reaction, or PCR, and the development of economical and high-throughput sequencing technologies. PCR allowed the targeted retrieval of small fragments of DNA that, in fortunate circumstances, are preserved in fossil and museum specimens. High-throughput sequencing approaches made the recovery of many millions of preserved molecules economically viable and practical, which has pushed the field into the realm of genomics. These key innovations have allowed ancient DNA to become an increasingly widespread research tool, with the capacity to expand both the temporal and taxonomic breadth of questions that can be asked in ecological and evolutionary research. As we noted in the first edition of this book that was published in 2012, progress in ancient DNA research has been inherently technology driven. In our first collection of protocols, we attempted to summarize the most common approaches toward the retrieval and analysis of ancient DNA sequences. We began with guidelines for establishing an ancient DNA laboratory and described extraction protocols that had been optimized for a wide range of different substrates. We included instructions for ancient DNA-specific approaches to DNA extraction, preparing and performing PCR and genomic library preparation, and suggested analytical approaches for population genetic analyses of ancient DNA and the initial quality control of data recovered from high-throughput sequencing. Many of the protocols included in the first edition remain relevant today. However, new protocols for both ancient DNA recovery and analysis have been introduced during the time since publication, and these have expanded significantly the range of samples from which ancient DNA can be recovered. These new protocols are the focus of this second edition. We include in this second edition protocols that address the most challenging aspects of experimental work in ancient DNA: preparing ancient samples for DNA extraction, the DNA extraction itself, and transforming extracted ancient DNA molecules so that they can be sequenced on the different available sequencing platforms, which is also known as sequencing library preparation. We also include several chapters that discuss the analysis of high-throughput sequencing data recovered from ancient specimens, which, because of the degraded nature of ancient DNA and common co-extraction of contaminant DNA, has challenges that are unique compared to data recovered from modern specimens. We begin with a chapter that discusses procedures for setting up an ancient DNA laboratory and authenticating recovered ancient DNA. Next, we include two chapters that describe methods to pretreat samples prior to ancient DNA extraction, both of which lead to improvements in the relative abundance of endogenous versus contaminant DNA. We then present four chapters that describe the latest innovations in retrieving ancient DNA from organic remains and environmental samples. We include updated protocols presenting

v

vi

Preface

double-stranded and single-stranded approaches to prepare genomic libraries, and several approaches enrich ancient DNA extracts for molecular targets of interest. Finally, we include a chapter on data authenticity assessment, which is critical for all ancient DNA studies, and a chapter that includes a practical approach and troubleshooting advice for those aiming to assemble ancient genomes from next-generation sequencing data. As in the first edition, the goal of this second edition is to present these protocols in a manner that makes them easily accessible for everyday use in the lab. We hope this book will be another useful source of information for both beginning and experienced researchers hoping to add ancient DNA to their professional toolkit. We express our sincere thanks to all the authors for their willingness to share their time and their trade secrets, and to Prof. John Walker at Humana Press for giving us the opportunity to assemble this collection of protocols. Santa Cruz, CA, USA Potsdam, Germany Tromsø, Norway Potsdam, Germany Potsdam, Germany Petropolis, RJ, Brazil

Beth Shapiro Axel Barlow Peter D. Heintzman Michael Hofreiter Johanna L. A. Paijmans Andre´ E. R. Soares

Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Contributors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

v ix

1 Setting Up an Ancient DNA Laboratory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tara L. Fulton and Beth Shapiro 2 Pretreatment: Removing DNA Contamination from Ancient Bones and Teeth Using Sodium Hypochlorite and Phosphate . . . . . . . . . . . . . . . . Petra Korlevic´ and Matthias Meyer 3 Pretreatment: Improving Endogenous Ancient DNA Yields Using a Simple Enzymatic Predigestion Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hannes Schroeder, Peter de Barros Damgaard, and Morten E. Allentoft 4 Extraction of Highly Degraded DNA from Ancient Bones and Teeth . . . . . . . . . Jesse Dabney and Matthias Meyer 5 Sampling and Extraction of Ancient DNA from Sediments . . . . . . . . . . . . . . . . . . . Laura S. Epp, Heike H. Zimmermann, and Kathleen R. Stoof-Leichsenring 6 Extraction of Ancient DNA from Plant Remains . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nathan Wales and Logan Kistler 7 DNA Extraction from Keratin and Chitin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Paula F. Campos and M. Thomas P. Gilbert 8 Double-Stranded Library Preparation for Ancient and Other Degraded Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kirstin Henneberger, Axel Barlow, and Johanna L. A. Paijmans 9 A Method for Single-Stranded Ancient DNA Library Preparation. . . . . . . . . . . . . Marie-Theres Gansauge and Matthias Meyer 10 Sequencing Library Preparation from Degraded Samples for Non-illumina Sequencing Platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Renata F. Martins, Marie-Louise Kampmann, and Daniel W. Fo¨rster 11 Whole-Genome Capture of Ancient DNA Using Homemade Baits . . . . . . . . . . . Gloria Gonza´lez Fortes and Johanna L. A. Paijmans 12 Generating RNA Baits for Capture-Based Enrichment. . . . . . . . . . . . . . . . . . . . . . . Noah Snyder-Mackler, Tawni Voyles, and Jenny Tung 13 Hybridization Capture of Ancient DNA Using RNA Baits . . . . . . . . . . . . . . . . . . . Andre´ E. R. Soares 14 Application of Solid-State Capture for the Retrieval of Small-to-Medium Sized Target Loci from Ancient DNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Johanna L. A. Paijmans, Gloria Gonza´lez Fortes, and Daniel W. Fo¨rster 15 Targeted PCR Amplification and Multiplex Sequencing of Ancient DNA for SNP Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Saskia Wutke and Arne Ludwig

1

vii

15

21 25 31 45 57

65 75

85 93 107 121

129

141

viii

16

17 18

Contents

Targeted Amplification and Sequencing of Ancient Environmental and Sedimentary DNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 Ruth V. Nichols, Emily Curd, Peter D. Heintzman, and Beth Shapiro Authentication and Assessment of Contamination in Ancient DNA . . . . . . . . . . . 163 Gabriel Renaud, Mikkel Schubert, Susanna Sawyer, and Ludovic Orlando Assembly of Ancient Mitochondrial Genomes Without a Closely Related Reference Sequence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 Christoph Hahn

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

215

Contributors MORTEN E. ALLENTOFT  Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark AXEL BARLOW  Institute for Biochemistry and Biology, University of Potsdam, Potsdam, Germany PAULA F. CAMPOS  Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark EMILY CURD  Department of Ecology and Evolutionary Biology, University of California Los Angeles, Los Angeles, CA, USA JESSE DABNEY  Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, UK PETER DE BARROS DAMGAARD  Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark LAURA S. EPP  Polar Terrestrial Environmental Systems, Alfred Wegener Institute Helmholtz Centre for Polar and Marine Research, Potsdam, Germany; Department of Biology, University of Konstanz, Konstanz, Germany DANIEL W. FO¨RSTER  Department of Evolutionary Genetics, Leibniz Institute for Zoo and Wildlife Research (IZW), Berlin, Germany TARA L. FULTON  Environment and Climate Change Canada, Edmonton, AB, Canada MARIE-THERES GANSAUGE  Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany M. THOMAS P. GILBERT  Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark GLORIA GONZA´LEZ FORTES  Department of Life Sciences and Biotechnology, University of Ferrara, Ferrara, Italy CHRISTOPH HAHN  Institute of Biology, University of Graz, Graz, Austria PETER D. HEINTZMAN  Tromsø University Museum, UiT—The Arctic University of Norway, Tromsø, Norway KIRSTIN HENNEBERGER  Institute for Biochemistry and Biology, University of Potsdam, Potsdam, Germany MARIE-LOUISE KAMPMANN  Department of Evolutionary Genetics, Leibniz Institute for Zoo and Wildlife Research (IZW), Berlin, Germany; Section of Forensic Genetics, Department of Forensic Medicine, University of Copenhagen, Copenhagen, Denmark LOGAN KISTLER  Department of Anthropology, National Museum of Natural History, Smithsonian Institution, Washington, DC, USA PETRA KORLEVIC´  European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK; Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany ARNE LUDWIG  Department of Evolutionary Genetics, Leibniz Institute for Zoo and Wildlife Research (IZW), Berlin, Germany RENATA F. MARTINS  Department of Evolutionary Genetics, Leibniz Institute for Zoo and Wildlife Research (IZW), Berlin, Germany; Institute for Biochemistry and Biology, University of Potsdam, Potsdam, Germany

ix

x

Contributors

MATTHIAS MEYER  Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany RUTH V. NICHOLS  Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA, USA LUDOVIC ORLANDO  Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Copenhagen K, Denmark; Laboratoire d’Anthropobiologie Mole´culaire et d’Imagerie de Synthe`se, CNRS UMR 5288, Universite´ de Toulouse, University Paul Sabatier, Toulouse, France JOHANNA L. A. PAIJMANS  Institute for Biochemistry and Biology, University of Potsdam, Potsdam, Germany GABRIEL RENAUD  Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Copenhagen K, Denmark SUSANNA SAWYER  Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Copenhagen K, Denmark HANNES SCHROEDER  Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark MIKKEL SCHUBERT  Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Copenhagen K, Denmark BETH SHAPIRO  Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA, USA NOAH SNYDER-MACKLER  Department of Evolutionary Anthropology, Duke University, Durham, NC, USA; Department of Psychology, University of Washington, Seattle, WA, USA ANDRE´ E. R. SOARES  Laboratorio Nacional de Computac¸a˜o Cientı´fica, Petropolis, RJ, Brazil KATHLEEN R. STOOF-LEICHSENRING  Polar Terrestrial Environmental Systems, Alfred Wegener Institute Helmholtz Centre for Polar and Marine Research, Potsdam, Germany JENNY TUNG  Department of Evolutionary Anthropology, Duke University, Durham, NC, USA TAWNI VOYLES  Department of Evolutionary Anthropology, Duke University, Durham, NC, USA NATHAN WALES  Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA; Laboratory of Molecular Anthropology and Image Synthesis, University Paul Sabatier, Toulouse, France; Department of Archaeology, University of York, York, UK SASKIA WUTKE  Department of Environmental and Biological Sciences, University of Eastern Finland, Joensuu, Finland HEIKE H. ZIMMERMANN  Polar Terrestrial Environmental Systems, Alfred Wegener Institute Helmholtz Centre for Polar and Marine Research, Potsdam, Germany

Chapter 1 Setting Up an Ancient DNA Laboratory Tara L. Fulton and Beth Shapiro Abstract Entering into the world of ancient DNA research is nontrivial. Because the DNA in most ancient specimens is degraded to some extent, the potential is high for contamination of ancient samples, ancient DNA extracts, and genomic sequencing libraries prepared from these extracts with non-degraded DNA from the present-day environment. To minimize the risk of contamination in ancient DNA environments, experimental protocols specific to handling ancient specimens, including those that outline the design and layout of laboratory space, have been introduced. Here, we outline challenges associated with working with ancient samples, including providing guidelines for setting up a new ancient DNA laboratory. We also discuss steps that can be taken at the sample collection and preparation stage to minimize the potential for contamination of ancient DNA experiments with exogenous sources of DNA. Key words Ancient DNA, aDNA, DNA damage, Laboratory setup, Contamination, Subsampling, Sample preparation, Guidelines

1

Introduction In 1984, DNA sequences were published that had been recovered from a museum-preserved sample of the extinct quagga, a relative of the zebra [1], marking the beginning of the field of ancient DNA. With the advent of the polymerase chain reaction (PCR) [2], ancient DNA researchers were able to target specific genomic fragments of interest, resulting in a significant increase in the breadth of topics that could be addressed using this new approach [3]. The power of ancient DNA is that it offers a window into past biota and evolutionary processes that is inaccessible using DNA from living organisms or paleontological studies alone. Over the last nearly three decades, ancient DNA has been used to address questions relating to, for example, the history and relationships of hominids [4–6], plant and animal domestication [7–12], population dynamics and diversity through time [13–18], and phylogenetics of extinct species [19–23]. Ancient DNA can be a powerful tool; however, it is one that should be handled with caution.

Beth Shapiro et al. (eds.), Ancient DNA: Methods and Protocols, Methods in Molecular Biology, vol. 1963, https://doi.org/10.1007/978-1-4939-9176-1_1, © Springer Science+Business Media, LLC, part of Springer Nature 2019

1

DNA-protein cross-links (i.e., Maillard products)

DNA-DNA cross-links via alkylation

8-Oxoguanosine (miscoding G to T)

5-OH-hydantoin (blocking)

5-OH-5-methylhydantoin (blocking)

Base modifications:

(d) guanine to xanthine

No amplification

Base misincorporation

No amplification; jumping PCR

G to A

C to T

C to T

(b) cytosine to uracila

(c) 5-methylcytosine to thymine

A to G

(a) adenine to hypoxanthine

a

Base misincorporations

Deamination causes miscoding lesions:

Depurination causes abasic site (hydrolysis)

Direct cleavage (hydrolysis)

Dessication, heat, chemicals, etc

Solutions

PTB (N-phenacylthiazolium bromide) to cleave cross-links

Special polymerases; cloning; multiple amplifications

Multiple extractions and amplifications; cloning; UDG (uracil DNA glycosylase) to remove uracil

Low quantity of surviving DNA; short Amplify short (1 h) do not yield significantly better results than do shorter incubation times (Fig. 1). Rather, the increase in endogenous content follows an asymptotic pattern as a function of time, because the majority of contaminant DNA is removed in the initial phase [3]. In fact, longer exposures can result in lower total DNA yields, resulting in extracts and libraries with low molecular complexity. Consequently, we advocate a brief (15–30 min) predigestion step prior to the full digestion.

2

Materials The starting material is bone or tooth powder (see Notes 1 and 2). The procedure can be performed in 5 ml LoBind tubes (Eppendorf) or 15 ml Falcon tubes. The digestion buffer should be freshly prepared using molecular grade reagents.

2.1 Predigestion Buffer

3 3.1

1. 0.45 M EDTA, pH 8 (Sigma-Aldrich), 0.25 mg/ml proteinase K, pH 8.0 (Sigma-Aldrich).

Methods Predigestion

1. Add 4 ml of predigestion buffer and suspend the sample powder by vortexing (see Note 3). 2. Incubate the sample on a rotor at 50  C (see Note 4) for 15–30 min, depending on the amount of input material and the state of preservation. The sample tubes should be sealed with Parafilm to prevent leakage. 3. Centrifuge for 3 min at maximum speed. Pipette off and discard the supernatant (preserving the pellet). 4. Proceed to your selected extraction protocol [2].

Improving Endogenous DNA Yields Using Enzymatic Pre-Digestion

23

Fig. 1 Effect of length of predigestion times on endogenous DNA contents. (a) Fold increase in endogenous DNA content in five different ancient DNA samples after different length predigestion times. (b) A logarithmic model fitted to the mean increase suggesting an asymptotic growth in endogenous DNA contents ( p ¼ 2.5e-5) [3]

4

Notes 1. Several studies have shown that the endogenous DNA fraction varies widely between tissues. The cementum-rich layer of tooth roots [3] and the petrous portion of the temporal bones [7] are both known to harbor well-preserved DNA. 2. We recommend crushing samples using a pestle and mortar. However, samples can also be homogenized using a microdismembrator, which is particularly useful when samples are very hard and dense. Alternatively, a dental tool or drill can be used to collect a fine powder by drilling a small hole into the sample, but caution should be applied so as not to overheat the sample when drilling: Use low drill speeds and do not drill continuously for more than 20–30 s at a time. Powdered samples tend to digest more quickly than samples that have been crushed, due to the increased surface area; these samples will also be more affected by pretreatment than crushed samples. 3. If any dirt particles are observed suspended in the pretreatment buffer, they should be removed carefully with a pipette.

24

Hannes Schroeder et al.

4. Proteinase K is a broad-spectrum serine protease that is used in DNA extractions to digest contaminating proteins. While the sample can be incubated at lower temperatures, the activity of the enzyme is increased by raising the temperature of the reaction from 37  C to 50  C or even 60  C. Proteinase K activity is also increased by addition of denaturing agents like SDS or urea. Raising the temperature or adding denaturing agents might therefore result in shorter digestion times but could also result in DNA fragmentation, as this is a temperature-dependent decay process. This has not been systematically tested on ancient samples.

Acknowledgments Centre for GeoGenetics is supported by the Danish National Research Foundation (DNRF94) and the Lundbeck Foundation. HS is supported by the European Research Council (ERC Synergy Project “Nexus1492”; FP7/2007-2013/grant agreement no. 319209) and the HERA JRP “Citigen” through the European Union’s Horizon 2020 research and innovation program under grant agreement no. 649307. MEA is funded by The Villum Foundation (Young Investigator Programme, grant no. 10120). References 1. Kemp BM, Smith DG (2005) Use of bleach to eliminate contaminating DNA from the surface of bones and teeth. Forensic Sci Int 154:53–61 2. Malmstro¨m H, Stora˚ J, Dale´n L, Holmlund G, Go¨therstro¨m A (2005) Extensive human DNA contamination in extracts from ancient dog bones and teeth. Mol Biol Evol 22:2040–2047 3. Damgaard PB, Margaryan A, Schroeder H, Orlando L, Willerslev E, Allentoft ME (2015) Improving access to endogenous DNA in ancient bones and teeth. Sci Rep 5:11184 4. Korlevic´ P, Gerber T, Gansauge M-T, Hajdinjak M, Nagel S, Aximu-Petri A et al (2015) Reducing microbial and human contamination in DNA extractions from ancient bones and teeth. BioTechniques 59:87–93

5. Boessenkool S, Hanghøj K, Nistelberger HM, Der Sarkissian C, Gondek AT, Orlando L et al (2017) Combining bleach and mild predigestion improves ancient DNA recovery from bones. Mol Ecol Resour 17:742–751 6. Campos PF, Craig OE, Turner-Walker G, Peacock E, Willerslev E, Gilbert MTP (2012) DNA in ancient bone – where is it located and how should we extract it? Ann Anat 194:7–16 7. Gamba C, Jones ER, Teasdale MD, McLaughlin RL, Gonzalez-Fortes G, Mattiangeli V et al (2014) Genome flux and stasis in a five millennium transect of European prehistory. Nat Commun 5:5257

Chapter 4 Extraction of Highly Degraded DNA from Ancient Bones and Teeth Jesse Dabney and Matthias Meyer Abstract We provide a DNA extraction protocol optimized for the recovery of highly fragmented molecules preserved within bones and teeth. In this method, the hard tissue matrix is degraded using an EDTA/ Proteinase K lysis buffer, and the DNA is purified using spin columns with silica membranes. This method efficiently recovers molecules as short as 35 base-pairs long. Key words Ancient DNA, DNA extraction, Degraded DNA, Silica purification, Paleogenomics

1

Introduction Ancient DNA molecules are highly fragmented, and it has been shown that an inverse exponential relationship exists between molecule length and abundance [1]. Whereas molecules shorter than approximately 50 bp are not suitable templates for direct amplification of genomic targets by PCR, the advent of high-throughput sequencing and development of ancient DNA-specific protocols to construct genomic libraries have enabled sequencing of much shorter DNA fragments than was possible using PCR. Here we describe a DNA extraction protocol that has been optimized for the extraction of DNA fragments as short as 35 base-pairs (bp), and that therefore recovers the full size spectrum of molecules that can be used for mapping to reference genomes. The protocol is suitable for and has been optimized for hard samples such as bones and teeth [2]. In this protocol, DNA is first released from a pulverized sample by dissolving the bone/ tooth matrix in a minimal extraction buffer containing EDTA and Proteinase K. DNA is then bound to silica in the presence of guanidine hydrochloride and isopropanol, purified, and eluted in a low-salt buffer. The protocol described here has been applied successfully to a large number of Holocene and Pleistocene bones

Beth Shapiro et al. (eds.), Ancient DNA: Methods and Protocols, Methods in Molecular Biology, vol. 1963, https://doi.org/10.1007/978-1-4939-9176-1_4, © Springer Science+Business Media, LLC, part of Springer Nature 2019

25

26

Jesse Dabney and Matthias Meyer

and teeth, including the ~430,000 years old skeletal material from the Sima de los Huesos, Spain [2, 3].

2

Materials

2.1 Chemicals and Reagents

0.5 M EDTA pH 8.0. 1 M Tris–HCl pH 8.0. Proteinase K. Guanidine hydrochloride. 3 M sodium acetate pH 5.2. PE buffer (Qiagen). Tween-20.

2.2

Consumables

1.5 mL tubes (see Note 1). 2.0 mL tubes. 15 mL tubes. High Pure Viral Nucleic Acid Large Volume Kit (Roche) (Optional) Bottle Top Filter, 500 mL, 45 mm neck diameter, 0.22 μm pore.

2.2.1 For an Alternative Spin-Column Assembly

MinElute PCR Purification Kit—spin columns. Extension reservoir (e.g., from Zymo-Spin V columns, Zymo Research). 50 mL conical bottom tubes.

2.3

Equipment

Large centrifuge able to handle 50 mL conical tubes. Benchtop centrifuge. Incubator and rotation device (or ThermoMixer). Scale (Optional) Vacuum pump.

3

Methods

3.1 Prepare Buffers (See Note 2)

Extraction Buffer (0.45 M EDTA, 0.25 mg/mL Proteinase K, 0.05% Tween-20 [final concentrations]). Example (10 mL): 1. 745 μL H2O. 2. 9 mL EDTA (0.5 M, pH 8).

Ancient DNA Extraction in Silica Columns

27

3. 250 μL Proteinase K (10 mg/mL). 4. 5 μL Tween-20. Binding Buffer (5 M GuHCl, 40% Isopropanol [final concentrations]). Example (50 mL) (see Note 3): 1. 23.88 g GuHCl. 2. Water to 30 mL. 3. 20 mL isopropanol. 4. 25 μL Tween-20. (Optional) Filter the binding buffer through a Corning filter with a vacuum pump to remove residual particles from GuHCl. Prepare excess buffer as some liquid is lost during this step. TET Buffer (1 mM EDTA, 10 mM Tris–HCl, 0.05% Tween-20 [final concentrations]). Example (50 mL): 1. 100 μL EDTA (0.5 M, pH 8). 2. 500 μL Tris–HCl (1 M, pH 8). 3. 25 μL Tween-20. 4. Water to 50 mL. 3.2

Prepare Samples

1. Collect 10–150 mg powderized sample in a 2 mL tube (see Note 4). 2. Add 1 mL of extraction buffer. Mix well by vortexing. 3. Incubate 16–24 h, rotating at 37  C. (See Note 5 for setup instructions if using the alternative spin-column assembly.)

3.3 DNA Binding, Wash, and Elution

1. For each sample and control, transfer ~10 mL of binding buffer to a labeled 15 mL tube (see Note 6), and add 400 μL 3 M sodium acetate. 2. Centrifuge samples from step 3 for 2 min at maximum speed in a benchtop centrifuge to pellet residual solid. 3. Transfer supernatant to the 15 mL tube containing binding buffer. Mix gently by shaking. The pellet can be saved for future use. 4. Pour the sample/binding buffer mixture into the reservoir of the spin-column assembly, and close the 50 mL tube with a screw cap. Centrifuge for 4 min at 400  g. Rotate tubes 90 and centrifuge for another 2 min at 400  g. 5. Remove the screw cap from the 50 mL tube, and transfer the spin-column assembly to a clean 2 mL collection tube. Carefully remove and discard the extension reservoir. If desired, the 50 mL tube with flow-through can be stored at 20  C.

28

Jesse Dabney and Matthias Meyer

6. Close and label the spin-column cap. 7. Perform a dry spin for 1 min at 3000  g in a benchtop centrifuge. Discard any flow-through. 8. Add 750 μL PE buffer to each column. Centrifuge for 30 s at 3000  g. Discard flow-through. 9. Repeat step 8 for a total of two washes. 10. Perform a dry spin for 1 min at maximum speed (~16,000  g), turning the columns in the centrifuge 180 relative to their previous orientation. 11. Transfer the column to a clean 1.5 mL tube. 12. Add 50 μL TET buffer directly onto the silica membrane. Let sit for 5 min. 13. Centrifuge 1 min at maximum speed. 14. Repeat steps 12 and 13 by transferring the eluate back onto the silica membrane, so that the final elution volume remains 50 μL. 15. Transfer the eluate (final DNA extract) to a clean 1.5 mL tube. Extracts can be stored at 20  C.

4

Notes 1. We recommend Eppendorf LoBind tubes or similar that reduce the loss of DNA due to tube-wall effects. 2. Buffers can be UV irradiated before use. 3. Add salt to a 50 mL tube, and fill with water to 30 mL, using gradations on the tube. Mix to dissolve salt, and heat briefly in a microwave if necessary. Add remaining reagents. 4. Powder or pulverized material can be generated with, e.g., a drill, freezer mill, bead homogenizer, or mortar and pestle. More than 150 mg is not recommended. 5. Treat extension reservoirs with bleach and rinse well with molecular biology grade water. Let dry and UV irradiate. Force extension reservoir into the opening of a MinElute spin column. Remove the extension reservoir-MinElute assembly from the MinElute collection tube, and place in a 50 mL conical tube. Save the collection tube for subsequent wash steps. The extension reservoir-MinElute assembly may become detached during centrifugation. It is advisable here to test the assembly with a dry run in the centrifuge. 6. Using the gradations on the tube is sufficient.

Ancient DNA Extraction in Silica Columns

29

References 1. Allentoft ME, Collins M, Harker D, Haile J, Oskam CL, Hale ML et al (2012) The half-life of DNA in bone: measuring decay kinetics in 158 dated fossils. Proc R Soc B 279 (1748):4724–4733 2. Dabney J, Knapp M, Glocke I, Gansauge M-T, Weihmann A, Nickel B et al (2013) Complete mitochondrial genome sequence of a middle

Pleistocene cave bear reconstructed from ultrashort DNA fragments. Proc Natl Acad Sci USA 110(39):15758–15763 3. Meyer M, Fu Q, Aximu-Petri A, Glocke I, Nickel B, Arsuaga J-L et al (2014) A mitochondrial genome sequence of a hominin from Sima de los Huesos. Nature 505(7483):403–406

Chapter 5 Sampling and Extraction of Ancient DNA from Sediments Laura S. Epp, Heike H. Zimmermann, and Kathleen R. Stoof-Leichsenring Abstract Environmental DNA preserved in sediments is rapidly gaining importance as a tool in paleoecology. Sampling procedures for sedimentary ancient DNA (sedaDNA) have to be well planned to ensure clean subsampling of the inside of sediment cores and avoid introducing contamination. Additionally, ancient DNA extraction protocols may need to be optimized for the recovery of DNA from sediments, which may contain inhibitors. Here we describe procedures for subsampling both nonfrozen and frozen sediment cores, and we describe an efficient method for ancient DNA extraction from such samples. Key words Sedimentary ancient DNA (sedaDNA), Environmental DNA (eDNA), Sediment cores, Permafrost, Metagenomics, Metabarcoding

1

Introduction Sedimentary deposits may contain DNA from a variety of organisms [1–7]. Such DNA is termed sedimentary DNA (sedDNA), environmental DNA (eDNA), or, if the samples are from ancient environmental contexts, sedimentary ancient DNA (sedaDNA) [3]. Often, sedimentary samples, such as cores, show a well-defined chronological stratigraphy, and ancient DNA isolated from these environmental samples can be used to reconstruct past species assemblages through metabarcoding [8] or shotgun metagenomic analyses [7]. The use of this source of DNA is rapidly gaining importance in paleoecological studies, in particular as its provenance, taphonomy, and the reliability of inferences are better understood [9, 10]. While early studies concentrated on samples from cold environments [1, 11], investigation of sedaDNA from temperate [12, 13] and tropical settings [14, 15] is also possible. Sediments are usually sampled in the field by retrieving sediment cores, from which clean subsamples, suitable for ancient DNA (aDNA) analysis, are taken in a laboratory setting. The sampling procedure typically involves interdisciplinary teamwork between specialists for aDNA analyses and sedimentologists/geoscientists.

Beth Shapiro et al. (eds.), Ancient DNA: Methods and Protocols, Methods in Molecular Biology, vol. 1963, https://doi.org/10.1007/978-1-4939-9176-1_5, © Springer Science+Business Media, LLC, part of Springer Nature 2019

31

32

Laura S. Epp et al.

Clean subsampling for aDNA is crucial, because (1) in contrast to working with samples from single organisms, the species composition of environmental samples is not known beforehand and (2) the samples cannot be cleaned after sampling. Below, we describe a protocol for sampling nonfrozen sediment cores, as typically retrieved from lacustrine and marine environments, and frozen sediment cores typically retrieved from permafrost deposits. Extraction protocols for sedaDNA research have to ensure an efficient removal of inhibitors, such as humic acids, while at the same time retaining a maximum of DNA in solution [16]. While several protocols have been developed recently ([7, 17], but see Note 1) and the coming years will see additional protocols and further improvements, we recommend that efficient extraction can be achieved using the protocol we describe below. The protocol is based on a kit that is available for purchase, and, while other modifications have been proposed [16], we find from comparative analysis that the version below is most efficient. This protocol is suitable for up to 10 g of sediment, but high yields and a high diversity of authentic ancient DNA can be recovered from sediment quantities as small as 2 g.

2

Materials Sampling

Sampling of the sediments should be performed in a clean, sheltered environment. It is often not possible to perform subsampling of a complete core in an aDNA lab due to both space limitations and the fact that subsampling will typically be performed at core storage facilities rather than the institutions hosting an aDNA lab. An ideal workspace for core sampling is a windowless room, like a climate chamber, with work benches that can be thoroughly cleaned before sampling. Climate chambers have the advantage that the temperature can be regulated. Climate chambers are a prerequisite for clean sampling of frozen sediments, which should be performed at temperatures well below 0  C (e.g., 10  C). For nonfrozen sediments, keeping a low temperature during sampling is less critical, but sampling in a climate chamber at about 10  C is ideal, as it allows a meticulous, unhurried sampling and temporary storage of multiple core sections (see Note 2 on sampling duration).

2.1.1 Preparation of Tracer DNA (Optional)

Prepared tracer DNA can be spiked onto the outside of the core (see Note 3).

2.1

1. Vector DNA (pCR®4-TOPO®, Invitrogen). This kit includes primers T3 and T7 and dNTP mix. 2. AmpliTaq Gold DNA polymerase (Applied Biosystems). This kit includes magnesium chloride (MgCl2) and 10 Gold PCR buffer.

Sampling and DNA Extraction from Sediments

33

3. Molecular biology grade water. 4. Thermal cycler. 5. 50 mL centrifuge tube. 6. Absolute ethanol. 2.1.2 Nonfrozen Sediment Cores

1. Large sterile scalpel tips, e.g., #22 (at least two per sample). 2. Fitting stainless steel scalpel handles (at minimum two, preferably four or more). 3. Sterile syringes (BD Discardit™ II 5 mL (Beckton Dickinson) or similar), one per sample. 4. Small kitchen knives with plastic handles (at least 10) for cutting the tops off the syringes. 5. Squirt bottles of 2% deconex® 11 Universal solution, 5% bleach solution, distilled water, and 70% ethanol. 6. Four plastic beakers of about 500 mL, to rinse utensils on the bench. 7. DNA-ExitusPlus™ (AppliChem), one bottle. 8. Sample tubes. E.g. tube 8 mL, 57  16.5 mm, polypropylene (60.542.024, Sarstedt). One is required per sample, and one per surface sample, if working with tracer DNA (see Note 3). 9. Measuring tape. 10. Clean room attire: coverall, face mask, hairnets (three per person), protective sleeves (three pairs per person), and two pairs of nitrile gloves. 11. Aluminum foil. 12. Plastic wrap. 13. UV cross-linker. 14. Small utensils: paper towels (three to four rolls), wiping cloth, trash bags, lab gloves, pens for labeling, and sticky tape. 15. Optional (see Note 3): prepared tracer DNA mixed with absolute ethanol in a 50 mL centrifuge tube (see Subheadings 2.1.1 and 3.1.1) and a paintbrush that fits into the 50 mL centrifuge tube.

2.1.3 Frozen Sediment Cores

1. Large sterile scalpel tips, e.g., #22 (at least two per sample). 2. Fitting stainless steel scalpel handles (at a minimum two, preferably four or more). 3. Drawknives. Straight and 225 mm length (at least one for each core section sampled in a day). 4. Small kitchen knives with plastic handles (at least one for each sample/day).

34

Laura S. Epp et al.

5. Hole saws. Tungsten carbide tipped, one tooth, 25 mm diameter (one per sample and day; see Note 2). 6. Drill and two drill chucks (see Note 4). 7. Rectangular blocks (approximately 20 cm long, 10 cm wide, 10 cm high) to fix to the sides of the bench. One block is to stabilize the frozen core sections during sampling and, if a handheld drill is used, two further blocks, on which the core section is placed during drilling. 8. A board then can support the core sections and has a hole of 7 cm in the middle. For use with a handheld drill. 9. Bar clamps to fix the blocks tightly to the sampling bench. 10. Plastic sample tubes. E.g. tube 25 mL 90  25, polypropylene þ Cap (60.9922.243, Sarstedt), one per sample. 11. Squirt bottles of 2% deconex® 11 Universal solution, 5% bleach solution, distilled water, and absolute ethanol. 12. Four plastic beakers of about 500 mL, to rinse utensils on the bench. 13. DNA-ExitusPlus™ (AppliChem). One bottle (see Note 5). 14. DNA Away® to clean drill chucks and hole saws. 15. Aluminum foil. 16. Plastic wrap. 17. Bunsen burner. 18. UV cross-linker. 19. Small utensils: paper towels (three to four rolls), wiping cloth, trash bags, lab gloves, pens for labeling, and sticky tape. 20. Cleaning utensils. Toothbrushes, household sponges, and wiping cloths. 21. Clothing for inside the cold room: snow overall, boots, thermo-gloves, hairnets (three per person), protective sleeves (one pair per sample), nitrile gloves. See Note 6 on face masks. 22. Safety glasses or face shield (for the person performing the drilling). 23. Clothing for outside the cold room: coverall, face mask, hairnets, protective sleeves, and two pairs of nitrile gloves. 24. Optional (see Note 3): prepared tracer DNA mixed with absolute ethanol in a 50 mL centrifuge tube (see Subheadings 2.1.1 and 3.1.1) and a paintbrush that fits into the 50 mL centrifuge tube. 2.2 DNA Extraction of Sedimentary Material

1. Contents of the DNeasy PowerMax® Soil DNA Isolation Kit (Qiagen): empty 50 mL tubes, 50 mL tubes containing “PowerBeads,” spin filter tubes, and buffers C1–C6. Solution

Sampling and DNA Extraction from Sediments

35

C1 is a sodium dodecyl sulfate solution and Solution C6 is a buffer consisting of 10 mM Tris. 2. Proteinase K solution at 2 mg/mL. 400 μL (equivalent to 0.8 mg) is required per sample. 3. Optional: 1 M dithiothreitol (DTT). 500 μL is required per sample (see Note 7). 4. p1000 pipette, 10 mL pipettor, and tips. 5. 1.5 mL microcentrifuge tube for storage of the extract. One per sample. 6. Vortexer with adapter for 50 mL tubes, e.g., Vortex-Genie® 2 with relevant adapter (available from Qiagen). 7. A centrifuge capable of holding 50 mL tubes and reaching 2500  g. 8. Oven for overnight incubation at 56  C, large enough to hold all extraction tubes on a nutating mixer or wheel. Ideally, the oven will have an electrical socket on the inside, where a nutating mixer/wheel can be plugged, but other setups are possible. 9. Nutating mixer/wheel, capable of carrying ten or more 50 mL tubes and designed to operate at a temperature of 56  C.

3

Methods Sampling

To minimize the risk of contamination, we advise to sample cores from bottom to top, i.e., beginning with samples with a presumably lower concentration of (ancient) DNA before proceeding samples with a presumably higher concentration of DNA [19] . Make sure that all persons involved in the sampling are dressed as described in Materials whenever handling or being near the open cores. It is advisable to work with at least two people to sample more efficiently. For safety reasons, drilling of frozen sediments and any work in a cold room should not be carried out alone.

3.1.1 Preparation of Tracer DNA (Optional)

Prepared tracer DNA can be spiked onto the outside of the core (see Note 3).

3.1

1. Set up a polymerase chain reaction (PCR) containing 1 μL of vector DNA, 0.4 μM of each primer T3 and T7, 1 mM of dNTPs, 2.5 mM of MgCl2, 1 Gold PCR buffer, and 0.8 U of AmpliTaq Gold DNA polymerase in a 50 μL reaction volume. 2. Run the PCR in a thermocycler under the following conditions: 2 min at 95  C, followed by 40 cycles of 95  C for 30 s, 52  C for 30 s, and 72  C for 30 s.

36

Laura S. Epp et al.

3. Mix the resulting PCR product with ~50 mL of absolute ethanol in a 50 mL centrifuge tube. 3.1.2 Nonfrozen Sediment Cores

Prior to sampling: 1. In an aDNA lab, clean the scalpel handles and small kitchen knives by soaking in bleach, rinsing with ethanol, and subjecting them to UV irradiation at a close distance (i.e., in a cross-linker) for 5 min on each side. In the cross-linker, place them on a clean sheet of aluminum foil, which can then be used to wrap all utensils together, and put them in a clean ziplock bag for transport to the sampling location. 2. Prepare squirt bottles with fresh solutions of deconex® bleach, ethanol, and distilled water. 3. At the sampling location, scrub the bench and the walls with deconex® and wipe with distilled water. Then, wipe with DNA-ExitusPlus™ and finally with distilled water (see Note 5). 4. Prepare stands to mount the half cores (available in a sedimentology lab). Bring them to your sampling location and clean them thoroughly as described for the bench. 5. Prepare a core half for sampling and make sure to have a clean, undamaged surface. The samples will be taken from below his surface (see Note 8). Wrap the core half with clean plastic wrap until ready to sample. Sampling: 1. Make sure to clean the workspace, equipment, and the outside of the core pipe once again on the day of sampling. 2. Place squirt bottles with distilled water, bleach, and ethanol together with a beaker for each on the bench. 3. Mount the cores, unwrap them, and leave the plastic wrap lying loosely over the core. When sampling, lift the wrap only off the part you are working on. 4. If your core cases do not have a length scale, fix the measuring tape on the table, and place the core alongside, starting at zero from the bottom of the core. 5. Optional: Use the paintbrush to spread the ethanol-diluted tracer DNA on the area(s) from which you will be sampling. 6. Clean the paintbrush and any other sampling utensils with the distilled water, bleach, and ethanol. 7. Preparation of the sampling syringes (see Note 9 on when not to sample with syringes and for an alternative sampling procedure). Place a clean piece of aluminum foil on the bench. Take one of the knives and a syringe, and let only the syringe tip touch aluminum foil. Cut off the tip of the syringe to obtain a

Sampling and DNA Extraction from Sediments

37

small open-ended syringe (i.e., a coring device). You can prepare five to eight syringes (not more) in one batch in this way, using the same clean knife. Place the cut syringes back into their sterile plastic wrap and cover this with aluminum foil. 8. Label the first set of sample tubes. As a minimum, label with the sample names. Depth information can be added afterward. 9. Removal of the first 0.5 cm of sediment. Take a sterile scalpel tip from its packaging, mount it on a clean scalpel holder, and carefully cut away/lift off the sediment around your sampling spot with a single cut. (If you have applied tracer DNA, store the surface sample for later monitoring as described in Note 3.) 10. Discard the scalpel tip. Rinse the scalpel handle with distilled water from a squirt bottle, and let it soak in the bleach-filled beaker. 11. Using a new scalpel tip, repeat step 14. Let the second cut start from within the cut surface of the first cut. Again, discard the scalpel tip and clean the scalpel handle as above. 12. Take one of the prepared syringes and insert it carefully into the sediment to take a sample. Make sure not to reach the outer edge of the core; this can generally be accomplished by restricting the sample volume to 2 mL. 13. Transfer the sample directly from the syringe into a labeled sample tube. 14. Discard the syringe. 15. Take the cleaned scalpel handles from the bleach, rinse with ethanol, and let them air-dry. 16. Repeat steps 14–20 for the next sample. 17. The samples should be stored at a temperature 20  C. 18. At the end of each sampling day, take all equipment that should be cleaned back to the aDNA lab, and proceed as in step 1. 3.1.3 Frozen Sediment Cores

Prior to sampling: 1. In an aDNA lab, clean the scalpel handles, small kitchen knives, hole saws, and drill chucks in the following way: scrub with detergent (deconex®), rinse with water, wipe down with DNA Away®, and allow to air-dry. Then subject this equipment to UV irradiation at a close distance (i.e., in a cross-linker) for 5 min on each side. Include pieces of aluminum foil in the UV-irradiation procedure. Wrap each of the hole saws and pieces of equipment separately in foil immediately after UV irradiation. 2. Clean the drawknives. These should be scrubbed with detergent (deconex®), rinsed with water, wiped with DNA-ExitusPlus™, and rinsed with absolute ethanol. Residual ethanol can

38

Laura S. Epp et al.

be removed by a Bunsen burner flame. Then leave on the bench overnight under UV irradiation. 3. Prepare squirt bottles with fresh solutions of deconex® and bleach (if needed), as well as squirt bottles with absolute ethanol and distilled water, and pack a bottle of DNA-ExitusPlus™. 4. At the sampling location: Before lowering the temperature, scrub the bench and the walls with deconex®, and wipe with distilled water. Then wipe with DNA-ExitusPlus™ and finally with distilled water (see Note 5). 5. Tightly fix a cleaned rectangular block to your sampling bench with a bar clamp. This will be used to stabilize your frozen sediment core section when cutting off the surface. If only a wooden block is available, cover with aluminum foil, and make sure your cores do not touch the wood. This is to avoid contamination of the core by DNA from the wooden block. 6. Prepare the sampling and drilling station, either by cleaning a fixed table drill or by fixing further blocks to the bench for work with a handheld drill (see Note 4). 7. Optional: Using a paintbrush, spread the ethanol-diluted tracer DNA on all the core section(s) that will be sampled. 8. Clean the paintbrush and any other sampling utensils with the distilled water, bleach, and ethanol mix between taking samples. Sampling: 1. Take one core section out of the cold room (10  C) and let it thaw for 5 min. Then bring it back inside, and place it, lying on its flat, cut, inner surface, on aluminum foil on the bench. Stabilize it against a rectangular block (see Fig. 1a). 2. Removal of the outer sediment (Fig. 1a). Take a clean drawknife and remove the outer layer in one draw. If a second draw is needed, use a fresh knife. If you have applied tracer DNA, store the surface sample for later monitoring as described in Note 3. 3. Take a clean small kitchen knife and repeat the removal of the outer layer around the sampling spot. 4. Preparation of the workspace for drilling. Place a piece of aluminum foil on the table below the drill stand, onto which the drilled sample can drop. 5. Prepare a piece of aluminum foil, into which you can wrap the drilled sediment cylinder, when it is ready. 6. Mount the hole saw. Remove the aluminum foil from the back of the hole saw and secure it in the drill chuck. Remove the

Sampling and DNA Extraction from Sediments

39

Fig. 1 Sampling frozen sediment cores. (a) Removal of the outer sediment with a drawknife. The sediment core section is placed on aluminum foil on a bench with the flat inner surface facing down, and the outer sediment is removed in a single draw. (b) Drilling of the core section by inserting the core vertically into the cleaned surface

remaining aluminum foil just prior to drilling without touching the hole saw. 7. Drill a cylinder out of the sediment in one go, holding the drill vertically (Fig. 1b). 8. Wrap the sediment cylinder in aluminum foil and label it. It is important that the cylinder is wrapped in a timely manner. 9. Clean the drill chuck with absolute ethanol and distilled water, collect dirty knives and hole saws, and take them out of the cold room. They should be cleaned with water and a sterilized toothbrush. 10. Take the wrapped sediment cylinder out of the cold room, and, using fresh sterile scalpel blades, remove the ends of the cylinder. 11. Carefully put the cylinder in a prepared and labeled tube (with sample number, core, and depth information). The samples should be stored at a temperature 20  C. 12. At the end of each sampling day, take all equipment that should be cleaned back to the aDNA lab, and proceed as in step 1.

40

Laura S. Epp et al.

3.2 DNA Extraction of Sedimentary Material

Day 1: 1. Prepare the following tubes, supplied with the DNeasy PowerMax kit, with different kit solutions: (a) PowerBead Solution. Add 15 mL to the PowerBeads Tubes. (b) Solution C2. Add 5 mL to an empty 50 mL tube. (c) Solution C3. Add 4 mL to an empty 50 mL tube. (d) Solution C4. Mix well before use. Transfer 30 mL to an empty 50 mL tube. 2. Add the sediment sample (up to 10 g) to the tube containing the PowerBead Solution and beads (see Note 10). 3. Check whether Solution C1 is clear or contains a precipitate. If a precipitate has formed, heat at 60  C until it dissolves. Add 1.2 mL of Solution C1 to the tube containing the PowerBead Solution and sediment. 4. Add 400 μL of the Proteinase K solution (see Note 11) and optionally 500 μL of DTT solution (see Note 7). 5. Vortex for 10 min at highest speed. This step allows for the mechanical breakdown of the sediment. 6. Incubate overnight on a nutating mixer/wheel in the incubation oven at 56  C. This step allows for the chemical and enzymatic breakdown of the sediment. Day 2: 1. Centrifuge tubes at 2500  g for 3 min at room temperature. 2. Transfer the supernatant to the tube containing solution C2, and invert twice to mix. Incubate at 4  C or in a freezer for 10 min (see Note 12). The proprietary Solution C2 assists the removal of inhibitors. 3. Centrifuge the tubes at 2500  g for 4 min at room temperature. 4. Avoiding the pellet, transfer the supernatant to the tube containing solution C3, and invert twice to mix. Incubate at 4  C or in a freezer for 10 min (see Note 12). The proprietary Solution C3 also assists the removal of inhibitors. 5. Centrifuge the tubes at 2500  g for 4 min at room temperature. 6. Shake the tube containing solution C4. Avoiding the pellet, transfer the supernatant to the tube containing solution C4, and invert twice to mix. Solution C4 is a binding buffer (concentrated salt solution) that enables DNA to bind efficiently to the silica spin filter.

Sampling and DNA Extraction from Sediments

41

7. Perform a total of three centrifugations. Fill the spin filter with the mixture from step 12. Centrifuge at 2500  g for 2 min at room temperature and discard the flow through. Repeat this two more times or until all of the mixture from step 12 has passed through the spin filter. 8. Add 10 mL of solution C5 to the spin filter, and centrifuge at 2500  g for 3 min at room temperature. Discard the flow through. Solution C5 is an ethanol-based wash buffer that removes excess salts and other impurities from the filter. 9. Centrifuge the empty spin filter at 2500  g for 5 min at room temperature. This is to remove any residual Solution C5 that can inhibit downstream applications. 10. Carefully place the spin filter in a new collection tube. 11. To elute the DNA, add up to 5 mL of sterile Solution C6 to the center of spin filter membrane (but see Note 13), incubate for 5–10 min, and centrifuge at 2500  g for 3 min at room temperature. Discard the spin filter. 12. Prepare aliquots of the DNA extracts (see Note 14) and store these at 20  C.

4

Notes 1. A protocol focusing on the [17] has successfully been [12, 13] but in other cases protocol we describe here unpublished data).

extraction of extracellular DNA used in studies on sedaDNA has been less efficient than the (Alsos, I.G. pers. comm., and

2. Clean sampling of sediment cores is time intensive, especially for frozen samples. With the protocol presented here, and when sampling with two people, we calculate that two people can take about 30–40 nonfrozen samples in a day without rushing. For frozen samples, and with the involvement of three people, we have managed to take a maximum of 15 samples in a day. 3. To ensure that the samples have been hermetically sealed from the outside until the point of sampling and that no contamination is introduced from the core surface during sampling, it is possible to spike the core with tracer DNA [1, 7, 18]. This tracer DNA is spread on the surface of the core prior to sampling (or can be spread during core retrieval in the field) [1, 7]. The core surface is then removed, and the sample is taken. A DNA extraction is performed both of the removed surface and of the samples to be analyzed, and the presence of the tracer DNA is checked via polymerase chain reaction (PCR) on all samples. This procedure is currently not performed in all

42

Laura S. Epp et al.

studies using sedaDNA, and, in our opinion, it is most important when isolated samples are analyzed. When investigating a sequence of samples from sediment cores, we consider it more important to evaluate the consistency of the results, their fit into the stratigraphic context, and their congruence with other proxies. Further, when using a PCR product as tracer DNA (see Subheadings 2 and 3), one has to ensure that none of the potential target DNA (e.g., PCR products of plant DNA) can get into the tracer during its production—and this is often not possible. We therefore list this step as optional and suggest that researchers should consider carefully whether it should be carried out. 4. It is possible to perform the drilling both with a fixed table drill and a handheld drill. If using a handheld drill, make sure to prepare a stand for drilling. Fix two blocks of at least 10 cm height, covered with a board that will support your core section during drilling, to the edge of the workbench with bar clamps. The board should have a hole of about 7 cm diameter through which the drilled sample can fall onto the bench. Cover the board with aluminum foil and place a piece of aluminum foil on the bench below. 5. Cleaning can also be accomplished with 5% bleach and 70% ethanol, as long as the temperature is above 0  C. When sampling frozen cores, these liquids will freeze, so we only wipe with absolute ethanol and spread changeable aluminum foil during sampling. When cleaning with DNA-ExitusPlus™, we make sure that it is not sprayed further than it should be by spraying the solution onto a paper towel and using this to wipe the core. DNA-ExitusPlus™ can be removed from surfaces by wiping with water. 6. Wearing face masks at a temperature below freezing can lead to air condensing on the outside of the mask and possibly dripping during sampling. We therefore do not recommend wearing face masks in the cold room, but rather suggest to use face shields for safety reasons. 7. We have not performed a systematic comparison of extractions with and without the addition of DTT. We have in particular used DTT in extractions of frozen sediments. Both versions of the protocols have yielded good results in our experience. 8. Opening of the cores and preparation of core halves is performed according to standard procedures in a sedimentological lab, which we do not describe here. Of course, it is also possible to sample frozen cores that have not been halved in the same way as described here (see ref. 18). 9. While we recommend sampling with syringes whenever possible, some sediments, especially if they contain a high clay content, are too dense and glutinous for this procedure. In

Sampling and DNA Extraction from Sediments

43

this case, use sterile scalpels/scalpel blades to cut out a cube from the sampling spot. Use a new scalpel blade for each cut. Place the retrieved block on a clean piece of aluminum foil, and cut another thin slice from each side, always moving the sample to a clean spot on the foil before cutting. 10. A good option is to add the sample to the tube on a scale and note the added weight directly. 11. When adding Proteinase K to the lysis mix, it is advisable not to vortex the sample prior to this, because a lot of foam is produced when vortexing Solution C1. 12. The manufacturer’s protocol asks for incubation at 4  C. If that is not possible, incubation in a freezer is, in our experience, also fine, and the samples will not freeze during this period. 13. We generally elute DNA in 1.6 mL of Solution C6 (pipetted as 2 800 μL) and transfer 1.4 mL to a clean 1.5 mL microcentrifuge tube for storage. 14. We advise that two aliquots of each extraction are prepared immediately after elution, so that, if contamination is detected in an extraction blank of one lot of aliquots, subsequent reactions can immediately be performed on the other lot of aliquots.

Acknowledgments This work was supported by the DFG grant EP98/2-1 to LSE. We thank Inger G. Alsos for exchange of experiences concerning the DNA extraction protocols. References 1. Willerslev E, Hansen AJ, Binladen J, Brand TB, Gilbert MTP, Shapiro B, Bunce M, Wiuf C, Gilichinsky DA, Cooper A (2003) Diverse plant and animal genetic records from Holocene and Pleistocene sediments. Science 300 (5620):791–795. https://doi.org/10.1126/ Science.1084114 2. Coolen MJL, Muyzer G, Rijpstra WIC, Schouten S, Volkman JK, Damste JSS (2004) Combined DNA and lipid analyses of sediments reveal changes in Holocene haptophyte and diatom populations in an Antarctic lake. Earth Planet Sci Lett 223(1–2):225–239. https://doi.org/10.1016/j.epsl.2004.04.014 3. Haile J, Froese DG, MacPhee RDE, Roberts RG, Arnold LJ, Reyes AV, Rasmussen M, Nielsen R, Brook BW, Robinson S, Demuro M, Gilbert MTP, Munch K, Austin

JJ, Cooper A, Barnes I, Moller P, Willerslev E (2009) Ancient DNA reveals late survival of mammoth and horse in interior Alaska. Proc Natl Acad Sci U S A 106(52):22352–22357. https://doi.org/10.1073/Pnas.0912510106 4. Willerslev E, Davison J, Moora M, Zobel M, Coissac E, Edwards ME, Lorenzen ED, Vestergard M, Gussarova G, Haile J, Craine J, Gielly L, Boessenkool S, Epp LS, Pearman PB, Cheddadi R, Murray D, Brathen KA, Yoccoz N, Binney H, Cruaud C, Wincker P, Goslar T, Alsos IG, Bellemain E, Brysting AK, Elven R, Sonstebo JH, Murton J, Sher A, Rasmussen M, Ronn R, Mourier T, Cooper A, Austin J, Moller P, Froese D, Zazula G, Pompanon F, Rioux D, Niderkorn V, Tikhonov A, Savvinov G, Roberts RG, MacPhee RDE, Gilbert MTP, Kjaer KH, Orlando L, Brochmann C, Taberlet

44

Laura S. Epp et al.

P (2014) Fifty thousand years of Arctic vegetation and megafaunal diet. Nature 506 (7486):47–51. https://doi.org/10.1038/ nature12921 5. Epp LS, Gussarova G, Boessenkool S, Olsen J, Haile J, Schro¨der-Nielsen A, Ludikova A, Hassel K, Stenoien HK, Funder S, Willerslev E, Kjaer K, Brochmann C (2015) Lake sediment multi-taxon DNA from North Greenland records early post-glacial appearance of vascular plants and accurately tracks environmental changes. Quat Sci Rev 117:152–163. https://doi.org/10.1016/j. quascirev.2015.03.027 6. Alsos IG, Sjogren P, Edwards ME, Landvik JY, Gielly L, Forwick M, Coissac E, Brown AG, Jakobsen LV, Foreid MK, Pedersen MW (2016) Sedimentary ancient DNA from Lake Skartjorna, Svalbard: assessing the resilience of arctic flora to Holocene climate change. The Holocene 26(4):627–642. https://doi.org/ 10.1177/0959683615612563 7. Pedersen MW, Ruter A, Schweger C, Friebe H, Staff RA, Kjeldsen KK, Mendoza MLZ, Beaudoin AB, Zutter C, Larsen NK, Potter BA, Nielsenlo R, Rainville RA, Orlando L, Meltzer DJ, Kjaer KH, Willerslev E (2016) Postglacial viability and colonization in North America’s ice-free corridor. Nature 537(7618):45–49. https://doi.org/10.1038/nature19085 8. Taberlet P, Coissac E, Pompanon F, Brochmann C, Willerslev E (2012) Towards next-generation biodiversity assessment using DNA metabarcoding. Mol Ecol 21 (8):2045–2050. https://doi.org/10.1111/j. 1365-294X.2012.05470.x 9. Sjo¨gren P, Edwards ME, Gielly L, Langdon CT, Croudace IW, Merkel MKF, Fonville T, Alsos IG (2016) Lake sedimentary DNA accurately records 20th century introductions of exotic conifers in Scotland. New Phytol. https://doi.org/10.1111/nph.14199 10. Jørgensen T, Haile J, Moller P, Andreev A, Boessenkool S, Rasmussen M, Kienast F, Coissac E, Taberlet P, Brochmann C, Bigelow NH, Andersen K, Orlando L, Gilbert MTP, Willerslev E (2012) A comparative study of ancient sedimentary DNA, pollen and macrofossils from permafrost sediments of northern Siberia reveals long-term vegetational stability. Mol Ecol 21(8):1989–2003. https://doi.org/ 10.1111/j.1365-294X.2011.05287.x 11. Willerslev E, Cappellini E, Boomsma W, Nielsen R, Hebsgaard MB, Brand TB, Hofreiter M, Bunce M, Poinar HN, DahlJensen D, Johnsen S, Steffensen JP, Bennike O, Schwenninger JL, Nathan R, Armitage S, de Hoog CJ, Alfimov V, Christl M, Beer J, Muscheler R, Barker J, Sharp M, Penkman KEH, Haile J, Taberlet P,

Gilbert MTP, Casoli A, Campani E, Collins MJ (2007) Ancient biomolecules from deep ice cores reveal a forested Southern Greenland. Science 317(5834):111–114. https://doi. org/10.1126/Science.1141758 12. Giguet-Covex C, Pansu J, Arnaud F, Rey PJ, Griggo C, Gielly L, Domaizon I, Coissac E, David F, Choler P, Poulenard J, Taberlet P (2014) Long livestock farming history and human landscape shaping revealed by lake sediment DNA. Nat Commun 5:3211. https:// doi.org/10.1038/Ncomms4211 13. Pansu J, Giguet-Covex C, Ficetola GF, Gielly L, Boyer F, Zinger L, Arnaud F, Poulenard J, Taberlet P, Choler P (2015) Reconstructing long-term human impacts on plant communities: an ecological approach based on lake sediment DNA. Mol Ecol 24 (7):1485–1498. https://doi.org/10.1111/ mec.13136 14. Boessenkool S, McGlynn G, Epp LS, Taylor D, Pimentel M, Gizaw A, Nemomissa A, Brochmann C, Popp M (2014) Use of ancient sedimentary DNA from a biodiversity hotspot in the humid tropics as a novel conservation tool. Conserv Biol 28(2):446–455 15. Stoof-Leichsenring KR, Epp LS, Trauth MH, Tiedemann R (2012) Hidden diversity in diatoms of Kenyan Lake Naivasha: a genetic approach detects temporal variation. Mol Ecol 21(8):1918–1930. https://doi.org/10.1111/ j.1365-294X.2011.05412.x 16. Haile J (2012) Ancient DNA extraction from soils and sediments. In: Shapiro B, Hofreiter M (eds) Ancient DNA: methods and protocols. Springer ScienceþBusiness Media, New York, pp 57–64. https://doi.org/10.1007/978-161779-516-9 17. Taberlet P, Prud’homme SM, Campione E, Roy J, Miquel C, Shehzad W, Gielly L, Rioux D, Choler P, Clement JC, Melodelima C, Pompanon F, Coissac E (2012) Soil sampling and isolation of extracellular DNA from large amount of starting material suitable for metabarcoding studies. Mol Ecol 21(8):1816–1820. https://doi.org/10. 1111/j.1365-294X.2011.05317.x 18. Paus A, Boessenkool S, Brochmann C, Epp LS, Fabel D, Haflidason H, Linge H (2015) Lake Store Finnsjøen—a key for understanding Late-Glacial/early Holocene vegetation and ice sheet dynamics in the central Scandes mountains. Quat Sci Rev 121:36–51 19. Ba´lint M, Ma´rton O, Schatz M, Du¨ring RA, Grossart HP (2018) Proper experimental design requires randomization/balancing of molecular ecology experiments, Ecology and Evolution. 8:1786–1793. DOI: 10.1002/ ece3.3687

Chapter 6 Extraction of Ancient DNA from Plant Remains Nathan Wales and Logan Kistler Abstract Ancient plant remains from archaeological sites, paleoenvironmental contexts, and herbaria provide excellent opportunities for interrogating plant genetics over Quaternary timescales using ancient DNA (aDNA)based analyses. A variety of plant tissues, preserved primarily by desiccation and anaerobic waterlogging, have proven to be viable sources of aDNA. Plant tissues are anatomically and chemically diverse and therefore require optimized DNA extraction approaches. Here, we describe a plant DNA isolation protocol that performs well in most contexts. We include recommendations for optimization to retain the very short DNA fragments that are expected to be preserved in degraded tissues. Key words Ancient Paleoethnobotany

1

plant

DNA,

DNA

extraction,

Archaeogenomics,

Archaeobotany,

Introduction Ancient and historic plant tissues often contain traces of a range of biomolecules including lipids, proteins, RNA, and DNA [1–5]. While lipids and proteins tend to be the most resistant to degradation [6], genetic data provide a powerful means to investigate research questions at the gene, population, or species level. Researchers have characterized ancient plant DNA primarily from archaeological remains of domesticated crops, but aDNA approaches can be used to study a variety of evolutionary or ecological topics using DNA recovered from paleoenvironmental or herbarium tissue, such as exploring the change in range and diversity of climate indicator species like oaks (Quercus spp.) in late Holocene Europe [7]. Several DNA isolation protocols have been developed to target ancient plant remains. Unlike the largely consistent chemical makeup of animal skeletal remains, plant tissues vary considerably in their biochemical composition, both according to taxon and tissue type. During the life of a plant, compounds such as polysaccharides and polyphenols play vital roles in energy storage,

Beth Shapiro et al. (eds.), Ancient DNA: Methods and Protocols, Methods in Molecular Biology, vol. 1963, https://doi.org/10.1007/978-1-4939-9176-1_6, © Springer Science+Business Media, LLC, part of Springer Nature 2019

45

46

Nathan Wales and Logan Kistler

cellular structure, pigmentation, and pathogen resistance. These large molecules interfere, however, with the manipulation of DNA from fresh leaves and other living plant tissues. To overcome this, many researchers use commercially developed kits (e.g., Qiagen DNeasy Plant Mini Kit) or the CTAB (cetyltrimethylammonium bromide) protocol [8] to isolate pure DNA from plant tissues. In applying these methods to degraded plant samples, ancient DNA researchers have observed that the CTAB method is useful for many archaeobotanical and herbarium specimens [9, 10] and found success with commercial kits on some materials [11–13]. Slight modifications to these methods, such as the addition of PTB (N-phenacylthiazolium bromide) to digestion buffers [9, 13], also improve DNA recovery in particularly recalcitrant specimens. However, the recovery of plant ancient DNA still lags behind that of animal DNA, most likely because of differences in preservation. Most archaeobotanical remains that have yielded recoverable DNA were preserved via desiccation or waterlogging, for example. In addition, while some success has been reported in recovering ancient DNA from charred remains using PCR-based methods [14–17], the majority of charred samples contain little endogenous DNA [18]. The plummeting cost of high-throughput sequencing has enabled researchers to explore ancient plant remains at the genomic scale [19–21], and this development necessitates a reevaluation of aDNA extraction methods. In order to generate sufficient data for plant paleogenomic projects, DNA extractions must maximize DNA yield while preventing carryover of other biomolecules that may interfere with the construction of high-throughput sequencing libraries. Moreover, it is essential to retain short fragments of endogenous DNA ( 10,000  g) and 15 ml tubes (rcf > 3000  g), dependent on size of digestions and purifications to be performed. 7. Oven prepared at 55  C, within which a rotary device (below) can be placed. 8. Rotary mixer, wheel, or similar device to keep samples constantly in motion during incubation steps. 9. Tabletop vortex. 10. Sterile 1.5 ml and 15 ml tubes, depending on the size of the extraction being performed.

2.2 For Silica Column Purification (Subheading 3.3)

1. Qiaquick DNA purification kit (Qiagen, Valencia, CA) including “Qiaquick” silica columns and buffers “PB,” “PE,” and “EB.”

2.3 For Organic Purification (Subheading 3.4)

1. Tris-buffered phenol, pH 8.0. 2. Chloroform. 3. Isopropanol. 4. 3 M sodium acetate, approximate pH 5. 5. (Optional) DNA precipitation “carrier,” e.g., GlycoBlue (Ambion, Inc., Austin, TX). 6. “Molecular biology grade” ethanol, 85%. 7. TE elution buffer: 10 mM Tris–HCl, 1 mM EDTA (pH 8.0).

DNA Extraction from Keratin and Chitin

3

59

Methods Carry out all procedures at room temperature unless otherwise specified. Always incorporate extraction blanks in the analysis. This protocol assumes the use of pure keratin or chitin, e.g., hair, horn, nail, feather, and arthropod exoskeleton/wing carapaces.

3.1 Tissue Prepreparation

1. For most materials proceed directly to step 2. For large pieces of nail or horn, drill a suitable amount of powder (e.g., 100 mg) directly from the specimen, and collect the powder in an appropriate container. 2. For non-powdered material, clean the tissue via a brief wash in dilute bleach solution, taking care to remove all obvious sources of contaminant matter. For powdered material, clean the powder by immersing it in the bleach solution for 10–20 s, and then briefly centrifuge the mixture to pellet the powder. Pour off the bleach. 3. Rinse material several times in ddH2O to remove all traces of bleach (see Note 2). For powdered material, use a vortex to ensure the pellet from step 2 is homogenized after adding the water. After 10–20 s of incubation, re-pellet the powder. Pour off the ddH2O; then repeat.

3.2

Tissue Digestion

1. Add 40 μl 1 M DTT solution and 100 μl proteinase K solution per 860 μl stable digestion buffer to make the active digestion buffer. Mix well (see Note 3). 2. Add a suitable amount of digestion buffer to the sample (see Note 3). Vortex briefly to ensure that any pelleted powder is homogenized in the solution. Incubate the sample plus buffer overnight at 55  C with gentle agitation. 3. Keratinous samples may not fully digest after this incubation. If full digestion is required, add an additional 40 μl 1 M DTT solution and 100 μl proteinase K solution to the mixture, vortex briefly, and return to incubation with agitation for at least 1 more hour. Chitinous samples rarely fully digest; however in both tissues DNA is usually liberated into solution even if digestion does not appear to be complete upon visual inspection. 4. Proceed to DNA purification using either the silica (Subheading 3.3) or organic (Subheading 3.4) method (see Note 4).

3.3 DNA Purification: Silica Method (See Note 5)

1. Centrifuge the digestion mixture for 3–5 min at high speed (>10,000  g) to pellet any solid remains. Carefully pipette the liquid fraction of the digestion into a new tube. Avoid transferring any solids that may block the spin filter.

60

Paula F. Campos and M. Thomas P. Gilbert

2. Add five volumes Qiaquick buffer PB to the solution. 3. Mix thoroughly. 4. Add 700 μl of this mixture to the Qiaquick spin column. 5. Centrifuge for 1 min at 6000  g. This speed is useful to limit how much target DNA passes through the filter without binding. However, if the liquid does not pass through the filter in 1 min, the speed can be increased. 6. Empty the liquid waste from the spin column (see Note 6). Repeat steps 5 and 6 with the remaining PB buffer-digestion mix until all the liquid passes through the spin column. 7. Add 500 μl Qiagen wash buffer PE to the filter. 8. Centrifuge for 1 min at 10,000  g. Empty the waste and repeat if extra purity is required. 9. Centrifuge for 3 min at maximum speed to dry the filter. Any residual ethanol from the PE buffer will inhibit downstream applications. 10. Place the filter in a new 1.5 ml tube. Add 50–100 μl Qiagen elution buffer EB directly to the center of the filter, and leave at room temperature for 5 min (see Note 7). EB can be replaced with molecular biology grade ddH2O (pH 7–8). 11. Centrifuge for 1 min at maximum speed to collect the EB and DNA. 3.4 DNA Purification: Organic Extraction (See Note 8)

1. Add phenol (pH 8) to the digestion mix at a ratio of 1:1 with the total digestion volume. 2. Agitate gently at room temperature for 5 min. 3. Centrifuge for 5 min to separate the layers. The speed of centrifugation will depend on the volume of the digestion mix, the centrifuge capacity, and the maximum speed designation of the tubes being used. It is generally advisable to use the maximum speed possible. If after 5 min the layers have not fully separated, extend the centrifugation time. 4. Carefully remove the upper aqueous layer. Be careful not to remove the protein-containing interface. Discard the lower, phenol layer (see Note 8). 5. Add to 1 volume of new phenol. Repeat steps 2–4 in Subheading 3.4. After the second centrifugation, add the aqueous layer to 1 volume chloroform. 6. Agitate gently at room temperature for 5 min. 7. Centrifuge for 5 min to separate the layers. Remove the upper aqueous layer. Discard the lower, chloroform layer (see Note 8). 8. Add 0.6–1 volume isopropanol and 0.1 volume 3 M sodium acetate (approx. pH 5). A small amount of commercial carrier

DNA Extraction from Keratin and Chitin

61

solutions can also be added if required to facilitate pellet visualization, such as GlycoBlue (Ambion, Inc., Austin, TX), following the manufacturers’ guidelines. Mix well (see Note 9). 9. Immediately centrifuge at high speed (>10,000  g) for 30 min at room temperature. 10. Immediately following centrifugation, decant the liquid from the tube carefully. The DNA will have precipitated into a pellet at the bottom of the tube and may not be visible. 11. To rinse the pellet, gently add 500–1000 μl 85% ethanol, slowly invert the tube once, and then centrifuge for 5 min at high speed. 12. Gently decant the ethanol. Repeat if necessary. 13. All ethanol must be removed from the pellet as any residual ethanol will inhibit downstream applications. This can be achieved by using a small bore pipette and by briefly incubating the dry pellet at a relatively high temperature (e.g., 55–75  C). 14. Resuspend the pellet in elution TE buffer or ddH2O. If the pellet has become very dry, this may require leaving the pellet at room temperature in the liquid for 5–10 min, followed by gentle pipetting (see Note 10).

4

Notes 1. Neither DTT nor proteinase K are stable once added to the active digestion solution; thus the active buffer needs to be freshly made for each digestion. At 4  C, the SDS will precipitate out of solution. Prior to the addition of DTT and proteinase K, the buffer should be warmed up until the SDS is fully dissolved. 2. Any bleach carryover will degrade the DNA and reagents in subsequent steps of the DNA extraction; thus it is extremely important that bleach is removed completely. 3. The volume of digestion buffer needed is sample dependent but generally should be at least sufficient to cover the surface of the material. 4. DNA can be purified from the digestion mixture in a number of different ways. Selecting a method depends ultimately on convenience and user preference. For small volumes, silica spin columns are convenient, but for larger volumes these rapidly become very labor intensive. For larger volumes of digestion mix (e.g., >1 ml), organic extractions are often preferable, in particular if large amounts of undigested melanin, dirt, or other materials are present in solution, as these tend to block silica filters. For a silica protocol, refer to Subheading 3.3. For organic purification refer to Subheading 3.4.

62

Paula F. Campos and M. Thomas P. Gilbert

5. As recommended by Yang et al. [6], Qiagen’s “Qiaquick” PCR cleanup kits are an excellent and quick tool for purifying DNA. The instructions in the kit manual can be followed almost directly if one replaces the phrase “PCR product” with “DNA extract.” The only change we recommend is the modification of the centrifugal speeds. 6. Qiagen buffers contain guanidinium salts, and relevant local disposal regulations should be consulted. 7. The volume of EB to use in this step depends on final concentration of DNA required and can be modified. 8. Organic extractions use phenol and chloroform to help purify the DNA. Both phenol and chloroform are toxic, and phenol in particular is extremely dangerous. Neither should be used without appropriate training. Always handle both liquids and their containers with extreme care, using appropriate face, hand, and body protection. Do not handle using latex gloves, as these are permeable to phenol and chloroform; use only nitryl gloves. The fumes of both chemicals are dangerous; therefore these steps should always be performed in a vented fume hood. Disposal of both requires conformation to specific regulations; thus relevant local disposal regulations should be consulted. 9. Isopropanol precipitation is most effective at relatively high centrifugal forces and in small tubes (the DNA pellet is easiest to spot and re-suspend if 1.5 ml tubes or smaller are used). If large volumes are to be precipitated, we recommend first concentrating the liquid, for example, with a centrifugal concentrator such as an Amicon Centricon (Millipore, Billerica, Massachusetts) with 30 kD or less molecular weight cutoff. 10. Melanin pigments often co-purify with the DNA and coprecipitate with the DNA during isopropanol precipitation. This results in a brown concentrated DNA pellet and a brown extract after resuspension. As melanin can inhibit enzymatic reactions (e.g., PCR), an additional purification step may be followed, for example, using a silica procedure (e.g., following Subheading 3.3).

Acknowledgment M.T.P.G. was supported by the Danish National Science Foundation’s “Skou” grant program.

DNA Extraction from Keratin and Chitin

63

References 1. Bonnichsen R, Hodges L, Ream W et al (2001) Methods for the study of ancient hair: radiocarbon dates and gene sequences from individual hairs. J Archaeol Sci 28:775–785 2. Gilbert M, Wilson A, Bunce M et al (2004) Ancient mitochondrial DNA from hair. Curr Biol 14:463 3. Rawlence N, Wood J, Armstrong K et al (2009) DNA content and distribution in ancient feathers and potential to reconstruct the plumage of extinct avian taxa. Proc R Soc Biol Sci Ser B 276:3395

4. Willerslev E, Gilbert MT, Binladen J et al (2009) Analysis of complete mitochondrial genomes from extinct and extant rhinoceroses reveals lack of phylogenetic resolution. BMC Evol Biol 9:95 5. King G, Gilbert M, Willerslev E et al (2009) Recovery of DNA from archaeological insect remains: first results, problems and potential. J Archaeol Sci 36:1179–1183 6. Yang DY, Eng B, Waye JS et al (1998) Technical note: improved DNA extraction from ancient bones using silica-based spin columns. Am J Phys Anthropol 105:539–543

Chapter 8 Double-Stranded Library Preparation for Ancient and Other Degraded Samples Kirstin Henneberger, Axel Barlow, and Johanna L. A. Paijmans Abstract High-throughput sequencing (HTS) allows fast and cost-efficient sequencing of ancient DNA (aDNA) without prior information about what sequences should be targeted. One necessary step for HTS is the preparation of a sequencing library. Commercial kits are available for this purpose, but many of these are not suitable for aDNA or other types of damaged DNA. Here, we outline a protocol for HTS library preparation that is optimized for ancient DNA. We report the library conversion rate for a range of input template and adapter concentrations. Our results show that the protocol performs at a high efficiency. Key words Double-stranded library preparation, Degraded DNA, Ancient DNA, Historical DNA, Illumina sequencing

1

Introduction The establishment of high-throughput sequencing (HTS) has advanced considerably the field of ancient DNA (aDNA) research. HTS provides several advantages over traditional (Sanger) sequencing. These include the vastly reduced per-nucleotide sequencing costs, the ability to sequence DNA fragments without prior information about what sequences to target, and the potential to obtain useful sequence information from extremely short DNA fragments. HTS requires preparation of a sequencing library from the DNA template, which involves the ligation of universal sequence adapters to both ends of the DNA molecules. A number of commercial library preparation protocols are available and have been developed for modern, undamaged DNA (e.g., Nextera DNA Library Prep Kit from Illumina, KAPA HTP Library Preparation Kit from KAPA Biosystems). However, these protocols are unsuitable for ancient or other damaged DNA extracts, as these methods are typically associated with low rates of conversion into the sequencing library [1–3]. This is particularly detrimental for low-quality samples when DNA may only be present in trace amounts.

Beth Shapiro et al. (eds.), Ancient DNA: Methods and Protocols, Methods in Molecular Biology, vol. 1963, https://doi.org/10.1007/978-1-4939-9176-1_8, © Springer Science+Business Media, LLC, part of Springer Nature 2019

65

66

Kirstin Henneberger et al.

The optimization of library preparation protocols thus represents a crucial step in the analysis of ancient or degraded DNA. Several library preparation protocols have been developed which mediate the specific problems associated with aDNA [4–6]. Protocols developed for Illumina sequencing platforms can be broadly divided into two categories: those requiring doublestranded input DNA and those requiring single-stranded DNA. The latter have been shown to achieve much higher rates of conversion of very short aDNA molecules [7] and can be considered the best approach for samples with extremely small amounts of highly fragmented DNA. However, the double-stranded library preparation protocol is still used regularly in aDNA studies [8–12]. The double-stranded protocol may be particularly suitable for archival samples, which may have lower rates of DNA fragmentation and higher overall DNA quantity compared to ancient (e.g., Pleistocene) samples. In addition, double-stranded DNA protocols involve less handling time and lower cost than existing singlestranded protocols, which tend to require specialized and expensive oligonucleotides and lab equipment (e.g., thermomixer with cooling function, magnetic racks). Here, we outline a double-stranded library preparation method that is based on previously published protocols [5, 6, 13]. The double-stranded protocol involves blunt-end ligation of truncated adapters to template DNA, which are then extended to full adapters during indexing PCR. Some of the optimizations we present compared to the first double-stranded protocol [13] are the reduction of purification steps (after [6]), the introduction of a second index in the P5 adapter allowing for double indexing of libraries (to avoid sequencing artifacts, [14]), and, finally, the use of qPCR to estimate the optimal cycle number to avoid under- or overamplification of the library (after [5]). We additionally investigated the effect of changing the concentration of the adapter mix added during the ligation. In the double-stranded library preparation protocol presented by Meyer and Kircher [13], it is advised to reduce the amount of adapter mix when the input DNA quantity is low (50 ng or less). For DNA extracted from degraded samples, the total input amount is often lower than this threshold. However, the effect of adjusting the adapter mix concentration on the conversion rate of template molecules into library molecules, as well as the formation of adapter dimers during amplification, is not clearly understood (see Note 1). We measure the conversion rate at seven different adapter concentrations ranging from 0.5 to 2.5 μM, using either 50 ng or 500 ng of synthetic input DNA. The conversion rate is tested by quantitative PCR using two sets of primers: one primer pair that binds to the template and one primer pair that amplifies from the ligated adapters (library molecule), i.e., a molecule with both P5 and P7 adapters successfully ligated (Fig. 1). As the ligation of the adapter mix to the DNA molecules is random,

Double-Stranded Library Preparation IS7

67

P100_fwd adapter

template

adapter

P100_rev

IS8

1

CR = 2

Cq (IS7/IS8) - Cq (P100_fwd/P100_rev)

100 %

Fig. 1 Binding sites of primer pairs which were used to calculate the conversion rate CR

Conversion rates l

Conversion rate in percent

100

500 ng 50 ng

80

60 l

40

l l

l

l

l

l

20

0 0.5

1.0

1.5

2.0

2.5

Concentration of adapterMix in µM

Fig. 2 Conversion rates measured with qPCR (average values across three replicates—see Note 10) at different DNA input quantities (500 and 50 ng). A range of adapter mix from 0.5 μM to 2.5 μM was used

half of the library molecules will have either P5 or P7 adapters on both sides, which are lost during amplification. Thus, the conversion rate efficiency of double-stranded library preparation has a theoretical maximum of 50%. We found an average of conversion rates of template to library molecules to be at 47.4% for all different adapter concentrations, with no clear trend relative to adapter mix or input amount (Fig. 2). Considering the theoretical maximum of double-stranded library preparation conversion rate is 50%, the efficiency of the current protocol is very close at the maximum. Our results suggest that using 0.5 μM adapter mix is appropriate for both low and high input DNA amounts. The double-stranded

68

Kirstin Henneberger et al.

library preparation presented here is fast, cost-efficient, and especially well-suited for archival samples with moderate fragmentation and higher quantities of DNA, relative to ancient samples.

2 2.1

Materials Equipment

1. Cleanlab (¼pre-PCR lab). 2. Thermal cycler (cleanlab, post-PCR lab) for 0.2 ml PCR tubes. 3. Microcentrifuge (cleanlab, post-PCR lab) for 1.5 ml tubes and 8-strip PCR tubes. 4. Vortex mixer (cleanlab). 5. Centrifuge for 96-well plates (post-PCR lab). 6. PCR plates (96-well, 200 μl capacity) and adhesive PCR plate seals. 7. Racks for 0.5 ml and 1.5 ml tubes. 8. Low-retention microcentrifuge tubes (1.5 ml). 9. Real-time PCR cycler compatible with SYBR Green qPCR kits. 10. Tapestation or Bioanalyzer. 11. Qubit fluorometric quantitation system and high-sensitivity reagents.

2.2

Reagents

1. Water, HPLC grade. 2. Buffer Tango (with BSA) (10). 3. dNTP mix (25 mM each). 4. ATP (100 mM). 5. T4 polynucleotide kinase (10 U/μl). 6. T4 DNA polymerase (5 U/μl). 7. T4 DNA ligase buffer (10). 8. PEG-4000 (50%). 9. T4 DNA ligase (5 U/μl), with corresponding buffer and PEG-4000. 10. Bst DNA polymerase, large fragment (8 U/μl), supplied with corresponding ThermoPol Reaction Buffer (10). 11. AccuPrime Pfx DNA Polymerase (2.5 U/μl), supplied with corresponding AccuPrime reaction mix (10) (see Note 2). 12. Qiagen MinElute PCR purification kit (contains MinElute columns, Buffer PB, Buffer PE, and Buffer EB). 13. SYBR® Green PCR Master Mix. 14. 1 M Tris–HCl (pH 8.0). 15. 0.5 M EDTA (pH 8.0).

Double-Stranded Library Preparation

69

16. 5 M NaCl. 17. 10% Tween-20. 2.3 Reagent Setup/ Preparing Buffer

1. Adapter mix (20 μM each): see Subheading 2.5. 2. Oligo hybridization buffer (10): 500 mM NaCl, 10 mM Tris–HCl (pH 8.0), 1 mM EDTA (pH 8.0). 3. TET buffer: 10 mM Tris–HCl (pH 8.0), 1 mM EDTA (pH 8.0), 0.05% Tween-20.

2.4

Oligonucleotides

DNA sequences are shown in 50 -30 direction, and * indicates a PTO bond: 1. IS1_adapter.P5: A*C*A*C*TCTTTCCCTACACGACGCTC TTCCG*A*T*C*T. 2. IS2_adapter.P7: G*T*G*A*CTGGAGTTCAGACGTGTGC TCTTCCG*A*T*C*T. 3. IS3_adapter.P5 þ P7: A*G*A*T*CGGAA*G*A*G*C. 4. IS7: ACACTCTTTCCCTACACGAC. 5. IS8: GTGACTGGAGTTCAGACGTGT. 6. P100_fwd: GGATAGACCTGGTAGTGCAATCC. 7. P100_rev: CGTCAAACCCACGAGCTAGA.

2.5 Preparation of Double-Stranded Adapter Mix

1. Combine 1 μl IS1_adapter.P5 (500 μM), 1 μl IS3_adapter. P5 þ P7 (500 μM), 1.25 μl oligo hybridization buffer (10), and 9.25 μl water in a PCR tube. 2. Combine 1 μl IS2_adapter.P7 (500 μM), 1 μl IS3_adapter. P5 þ P7 (500 μM), 1.25 μl oligo hybridization buffer (10), and 9.25 μl water in a PCR tube. 3. Incubate the two reaction mixtures for 12 s at 95  C, followed by a ramp from 95  C to 12  C by using a rate of 0.1  C/s. 4. Pool both reactions to get 25 μl adapter mix (20 μM each adapter).

3

Method Perform all steps at room temperature, unless otherwise stated. All work with unamplified DNA should be done in an appropriate cleanroom. Amplification and subsequent steps should be done outside the cleanroom.

3.1 Library Preparation: Blunt-End Repair

1. Prepare a master mix containing, per reaction, 8.56 μl water, 3.5 μl Buffer Tango (with BSA) (10), 0.14 μl dNTP mix (25 mM each), 0.35 μl ATP (100 mM), 1.75 μl T4

70

Kirstin Henneberger et al.

polynucleotide kinase (10 U/μl), and 0.7 μl T4 DNA polymerase (5 U/μl). 2. Add 20 μl DNA extract (see Notes 3 and 4) to 15 μl master mix, mix gently by pipetting up and down, and incubate in a thermal cycler at 25  C for 20 min, followed by a heat inactivation of the enzymes at 72  C for 20 min, and then hold at 12  C. 3.2 Library Preparation: Adapter Ligation

1. Prepare a master mix containing, per reaction, 4 μl water, 6 μl T4 DNA ligase buffer (10), 6 μl PEG-4000 (50%), and 1.5 μl T4 DNA ligase (5 U/μl). 2. Add 7.5 μl adapter mix (20 μM each) (see Note 5) to each 35 μl sample, and mix carefully but thoroughly by flicking the PCR strip. 3. Add 17.5 μl master mix to each sample and mix carefully but thoroughly by flicking the PCR strip (see Note 6). 4. Incubate in a thermal cycler at 22  C for 30 min, and then hold at 12  C.

3.3 Library Preparation: Reaction Cleanup

1. Purify the reaction by using MinElute PCR Purification Kit. Add 300 μl Buffer PB (binding buffer) to each reaction and mix. 2. Transfer the sample to the MinElute column and centrifuge for 1 min at 17,900  g (13,000 rpm). 3. Discard flow-through. Wash the silica membrane using 650 μl Buffer PE (wash buffer), and centrifuge for 1 min at 17,900  g (13,000 rpm). 4. Discard flow-through. Centrifuge the column for 1 min at 17,900  g (13,000 rpm) to completely remove the Buffer PE. 5. Place the column into a clean 1.5 ml microcentrifuge tube. Add 10 μl Buffer EB (elution buffer) to the center of the silica membrane, and incubate for 5 min at room temperature. Centrifuge for 1 min at 17,900  g (13,000 rpm). 6. Repeat step 5.

3.4 Library Preparation: Adapter Fill-In

1. Prepare a master mix containing, per reaction, 14.1 μl water, 4 μl ThermoPol (reaction) buffer, 0.4 μl dNTP mix (25 mM each), and 1.5 μl Bst DNA polymerase, large fragment (8 U/μl). 2. Add 20 μl master mix to each 20 μl sample and mix gently. Incubate in a thermal cycler for 20 min at 25  C, followed by a heat inactivation at 80  C for 20 min, and then hold at 12  C.

3.5 Library Preparation: Quantification by qPCR

1. Make 1:20 dilutions of each sample in TET buffer (see Note 7) by using 1 μl of each sample (see Note 8).

Double-Stranded Library Preparation

71

2. Prepare the master mix by using 5 μl SYBR® Green PCR Master Mix (2), 0.2 μl primer IS7 (10 μM), 0.2 primer IS8 (10 μM), and 3.6 ml water for each reaction (see Notes 9 and 10), and mix thoroughly by inverting. 3. Pipette 9 μl master mix into a well of a qPCR plate and pipette 1 μl sample (1:20 diluted). Mix by pipetting up and down. 4. Seal the qPCR plate with the adhesive PCR plate seals, vortex the plate thoroughly, and spin down for 30 s at 600  g. 5. Place the qPCR plate into a qPCR machine, and incubate for 10 min at 95  C, followed by 40 cycles of 15 s at 95  C, 30 s at 60  C, and 1 min at 72  C, and measure the fluorescence, finally followed by 3 min at 12  C. 6. Identify the optimal number of amplification cycles based on the qPCR results (to avoid PCR plateau), and adjust the number due to different volumes and amount of samples that will be used for library amplification/indexing PCR (subtract 6 cycles; see Note 11). 3.6 Library Preparation: Amplification and Indexing

1. Prepare a master mix containing, per reaction, 17.8 μl water, 8 μl AccuPrime reaction mix (10), and 3.2 μl AccuPrime Pfx DNA polymerase (2.5 U/μl). 2. Add to each sample (39 μl) 6 μl indexing primer P7 and 6 μl indexing primer P5. 3. Add 29 μl master mix to each sample, and incubate in a thermal cycler for 2 min at 95  C, followed by X cycles of 15 s at 95  C, 30 s at 60  C, and 60 s at 68  C, finally followed by 3 min at 68  C; hold at 12  C (see Notes 12–14).

3.7 Library Preparation: Reaction Cleanup

1. Purify the reaction by using MinElute PCR Purification Kit. Add to each reaction 400 μl Buffer PB (binding buffer) and mix. 2. Apply the sample to the MinElute column and centrifuge for 1 min at 17,900  g (13,000 rpm). 3. Discard flow-through. Wash the column with 650 μl Buffer PE (wash buffer) and centrifuge for 1 min at 17,900  g (13,000 rpm). 4. Discard flow-through. Centrifuge the column for 1 min at 17,900  g (13,000 rpm) to remove completely the buffer PE. 5. Place the column in a clean 1.5 ml microcentrifuge tube. Add 10 μl Buffer EB (elution buffer) to the center of the MinElute membrane, and incubate for 5 min at room temperature. Centrifuge for 1 min at 17,900  g (13,000 rpm). 6. Repeat step 5. 7. Store the amplified and indexed libraries at 20  C.

72

4

Kirstin Henneberger et al.

Notes 1. The ligation of adapters to themselves should be chemically not possible due to non-phosphorylated ends [15]. Nevertheless, adapter dimers are always visible. 2. Other polymerases can be used. It has been shown that AccuPrime Pfx exhibits minimal biases during amplification [16]. 3. Use a total of 20 μl DNA or no more than of 500 ng DNA extract per reaction. 4. Include library blanks by adding water instead of the extract. We recommend to use one library blank per three to four samples. 5. Using less adapter will also give reliable results, e.g. 0.5 μM each (see Fig. 2). 6. Mixing thoroughly is important to prevent ligation of the sample to itself. 7. It is recommended to prepare the 1:20 dilutions in TET buffer immediately, also if the qPCR will not be done on the same day. 8. Suggested pause point: You can freeze your samples for several days at 20  C before you will continue with the library preparation. 9. Include at least one negative control by using water or TET buffer and do three or four replicates per reaction. 10. For each sample make three or four replicates for the qPCR measurement. 11. A duplication of molecules after each PCR cycle serves as a basic assumption for calculating the number of cycles. As the qPCR is performed with 1 μl of a 1:20 dilution of the template, ten cycles have to be subtracted to take the changed template volume in the indexing PCR reaction into account. Additionally, as the volume for the indexing PCR is eight times larger than that of the qPCR, three cycles should be added. Taken together, this means subtracting seven cycles from the ones estimated using qPCR. 12. We recommend always using at least six cycles for amplification/indexing PCR. 13. X equals to the optimal number of amplification cycles for each sample based on the previous qPCR results. 14. Suggested pause point: You can store your amplified and indexed libraries for several days at 4  C before you purify them.

Double-Stranded Library Preparation

73

References 1. Green RE, Krause J, Ptak SE et al (2006) Analysis of one million base pairs of Neanderthal DNA. Nature 444:330–336. https://doi.org/ 10.1038/nature05336 2. Hofreiter M, Paijmans JLA, Goodchild H et al (2014) The future of ancient DNA: technical advances and conceptual shifts. BioEssays 37:284–293. https://doi.org/10.1002/bies. 201400160 3. Noonan JP, Priest JR, Rohland N et al (2012) Genomic sequencing of Pleistocene cave bears. Science 597:597–599. https://doi.org/10. 1126/science.1113485 4. Briggs AW, Heyn P (2012) Preparation of next-generation sequencing libraries from damaged DNA. Methods Mol Biol 840:143–154 5. Gansauge M-T, Meyer M (2013) Singlestranded DNA library preparation for the sequencing of ancient or damaged DNA. Nat Protoc 8:737–748. https://doi.org/10.1038/ nprot.2013.038 6. Fortes GG, Paijmans JLA (2015) Analysis of whole mitogenomes from ancient samples. Methods Mol Biol 1347:179–195 7. Barlow A, Gonzalez Fortes GM, Dalen L et al (2016) Massive influence of DNA isolation and library preparation approaches on palaeogenomic sequencing data. bioRxiv. https://doi. org/10.1101/075911 8. Brace S, Thomas JA, Dale´n L et al (2016) Evolutionary history of the Nesophontidae, the last unplaced recent mammal family. Mol Biol Evol 33:msw186. https://doi.org/10. 1093/molbev/msw186 9. Gaubert P, Patel R, Veron G et al (2017) Phylogeography of the Small Indian Civet and Origin of Introductions to Western Indian Ocean

Islands. J Hered 108:270–279. https://doi. org/10.1093/jhered/esw085 10. Martins RF, Fickel J, Le M et al (2017) Phylogeography of red muntjacs reveals three distinct mitochondrial lineages. BMC Evol Biol 17:34. https://doi.org/10.1186/s12862017-0888-0 11. Palkopoulou E, Mallick S, Skoglund P et al (2015) Complete genomes reveal signatures of demographic and genetic declines in the woolly mammoth. Curr Biol 25:1395–1400. https://doi.org/10.1016/j.cub.2015.04.007 12. Patel RP, Fo¨rster DW, Kitchener AC et al (2016) Two species of Southeast Asian cats in the genus Catopuma with diverging histories: an island endemic forest specialist and a widespread habitat generalist. R Soc Open Sci 3:160350. https://doi.org/10.1098/rsos. 160350 13. Meyer M, Kircher M (2010) Illumina sequencing library preparation for highly multiplexed target capture and sequencing. Cold Spring Harb Protoc. https://doi.org/10.1101/pdb. prot5448 14. Kircher M, Sawyer S, Meyer M (2012) Double indexing overcomes inaccuracies in multiplex sequencing on the Illumina platform. Nucleic Acids Res 40:1–8. https://doi.org/10.1093/ nar/gkr771 15. Lehman IR (1974) DNA ligase: structure, mechanism, and function 16. Dabney J, Meyer M (2012) Length and GC-biases during sequencing library amplification: a comparison of various polymerasebuffer systems with ancient and modern DNA sequencing libraries. BioTechniques. https:// doi.org/10.2144/000113809

Chapter 9 A Method for Single-Stranded Ancient DNA Library Preparation Marie-Theres Gansauge and Matthias Meyer Abstract Genomic library preparation from highly degraded DNA is more efficient when library molecules are prepared separately from the complementary strands of DNA fragments. We describe a protocol in which libraries are constructed from single DNA strands in a three-step procedure: single-stranded ligation of the first adapter with T4 DNA ligase in the presence of a splinter oligonucleotide, copying of the DNA strand with a proofreading polymerase, and blunt-end ligation of the second double-stranded adapter with T4 DNA ligase. Key words NGS, Library preparation, Ancient DNA, Single-stranded ligation, T4 DNA ligase

1

Introduction High-throughput sequencing requires the preparation of DNA libraries. This process involves the ligation of short, known adapter sequences to both ends of DNA molecules, enabling their amplification and readout. Commonly used double-stranded library preparation methods are not ideally suited for recovering highly degraded DNA from ancient specimens. In 2012, we described a single-stranded library preparation technique that increased the sequence information retrieved from a small sample of finger bone from an extinct Denisovan individual by approximately one order of magnitude, allowing its genome to be sequenced to high coverage [1]. The method has since been used to generate genomewide sequence data from many other ancient specimens at various levels of coverage depth (e.g., [2–4]), including some samples of extraordinary old age [5]. Comparisons to double-stranded library preparation have confirmed that the single-stranded method greatly increases sequence complexity in libraries [6, 7] and in many cases improves the ratio of endogenous to environmental sequences [6].

Beth Shapiro et al. (eds.), Ancient DNA: Methods and Protocols, Methods in Molecular Biology, vol. 1963, https://doi.org/10.1007/978-1-4939-9176-1_9, © Springer Science+Business Media, LLC, part of Springer Nature 2019

75

76

Marie-Theres Gansauge and Matthias Meyer

Protocol step

3.1.1 DNA substrate

3.1.2 Adapter with TEG Biotin

NNNNNNN NNNNNNN

3.2.2

Splinter Primer with PTO´s

NNNNNNN

3.2.3 3.3.2

NNNNNNN

Double-stranded 2nd adapter Streptavidin-covered, magnetic bead

3.4.2

3.4.4

Fig. 1 Single-stranded library preparation. In the protocol provided here, DNA fragments are dephosphorylated at the 50 and 30 ends and separated into single strands by heat denaturation. 30 biotinylated adapter molecules are attached to the 30 ends of the DNA fragments using T4 DNA ligase and a splinter oligonucleotide carrying a stretch of six random nucleotides (marked as “N”). Following the immobilization of the ligation products on streptavidin-coated beads, the splinter oligonucleotide is removed by a bead wash at elevated temperature. Synthesis of the second strand is carried out using the Klenow fragment of E. coli DNA polymerase I. Not incorporated primers are removed through a bead wash at elevated temperature. Following the blunt-end ligation of the second adapter, the final library strand is released from the beads by heat denaturation

The single-stranded library preparation method that we described in 2012 has been refined in recent years [8, 9]. In its most recent implementation [10], single-stranded DNA ligation is performed using T4 DNA ligase in combination with a “splinter” oligonucleotide [11]. This replaces CircLigase, which is an expensive RNA ligase that is available from only one supplier. This modification has helped reduce costs and improve the robustness of the protocol (see Fig. 1 for an overview of the reaction steps). Nevertheless, single-stranded library preparation remains more time-consuming than double-stranded methods and requires higher initial investments in reagents due to expensive oligonucleotide modifications (see Table 1 for oligonucleotide sequences). In addition, a change to one of the Illumina adapter sequences necessitates the use of a nonstandard primer in sequencing (Fig. 2). These aspects should be considered when deciding whether to implement the method described below.

Pho-AGATCGGAAG(C3Spacer)10-TEG-biotin

SpacerC12-AA(SpacerC12)CTTCCGATCTNNNNNNN-AmC6

GTGACTGGAGTTCAGACGTGTGCTCTTCC*GA*TC*T

CGACGCTCTTC-ddC

Pho-GGAAGAGCGTCGTGTAGGGAAAGAG*T*G*T*A

PhoTCGTCGTTTGGTATGGCTTCATTCAGCTCCGGTTCCCAACGATCAAGGCGAGTTACATGAPho

ACACTCTTTCCCTACACGACGCTCTTCCTCGTCGTTTGGTATGGCTTC

GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTTCATGTAACTCGCCTTGATCGT

ACACTCTTTCCCTACACGAC

GTGACTGGAGTTCAGACGTGT

AATGATACGGCGACCACCGA

CAAGCAGAAGACGGCATACGA

ACACTCTTTCCCTACACGACGCTCTTCC

CL78

TL136

CL130

CL53

CL73

CL104

CL105

CL106

IS7

IS8

IS5

IS6

CL72

Sequencing primer

Primer for reamplification

Primer for reamplification

qPCR primer

qPCR primer

Primer for qPCR standard

Primer for qPCR standard

Positive control oligo

Adapter oligo 2, double-stranded ligation

Adapter oligo 1, double-stranded ligation

Extension primer

Splinter, single-stranded ligation

Adapter, single-stranded ligation

Application

All oligonucleotides should be purified by reverse-phase HPLC with the exception of CL78, which requires dual purification by ion-exchange HPLC. Possible suppliers include Sigma-Aldrich and Eurogentec

Sequence (50 –30 )

Name

Table 1 Overview of the oligonucleotides required for single-stranded library preparation, amplification, and sequencing

Single-Stranded DNA Library Preparation 77

Marie-Theres Gansauge and Matthias Meyer

PCR primers Library type

AATGATACGGCGACCACCGAGATCTACACNNNNNNNACACTCTTTCCCTACACGACGCTCTTCCGATCT TTACTATGCCGCTGGTGGCTCTAGATGTGNNNNNNNTGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA

AATGATACGGCGACCACCGAGATCTACACNNNNNNNACACTCTTTCCCTACACGACGCTCTTCC TTACTATGCCGCTGGTGGCTCTAGATGTGNNNNNNNTGTGAGAAAGGGATGTGCTGCGAGAAGG

CL72 Insert read forward Index read P5

insert

insert

Indexing primer P7

AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNNATCTCGTATGCCGTCTTCTGCTTG TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGNNNNNNNTAGAGCATACGGCAGAAGACGAAC

AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNNATCTCGTATGCCGTCTTCTGCTTG TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGNNNNNNNTAGAGCATACGGCAGAAGACGAAC

ACACTCTTTCCCTACACGACGCTCTTCC ACACTCTTTCCCTACACGACGCTCTTCCGATCT GGAAGAGCGTCGTGTAGGGAAAGAGTGT

Single-stranded double-indexed library

AATGATACGGCGACCACCGAGATCTACACNNNNNNNACACTCTTTCCCTACACGACGCTCTT

IS6 IS8

GATCGGAAGAGCACACGTCTGAACTCCAGTCAC GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT

Standard Illumina double indexed library

ACACTCTTTCCCTACACGAC

TGTGCAGACTTGAGGTCAGTG

Indexing primer P5

Sequencing primers

AATGATACGGCGACCACCGA

IS7

TGTGCAGACTTGAGGTCAGTGNNNNNNNTAGAGCATACGGCAGAAGACGAAC

IS5

AGCATACGGCAGAAGACGAAC

78

Index read P7 Insert read reverse

Fig. 2 Schematic overview of adapter sequences and primers used in library amplification and sequencing. Sequences are shown in the 50 –30 direction. “N’s” denote sample-specific index sequences (see Note 9). Standard Illumina adapter sequences are shown for comparison. For the forward insert read, single-stranded libraries require a nonstandard sequencing primer (CL72). Primers IS5 and IS6 can be used for library reamplification if it becomes necessary

2

Materials Incubation steps involving bead suspensions are best carried out in a thermoshaker with an interval mixing option (e.g., Cooling ThermoMixer MKR13, HLC/Ditabis) to avoid bead settling. Manual mixing is possible but substantially increases hands-on time. Mixing intervals may be extended to 5 min in the case of manual mixing. It is recommended to prepare master mixes for all reaction steps. Enzymes may be included in the master mixes. Keep the master mixes on ice until used.

2.1 Preparation of Buffers (Prepare 50 mL Each, Store at Room Temperature for up to 6 Months)

1. 0.1 BWT + SDS buffer: 0.1 M NaCl, 10 mM Tris–HCl, 1 mM EDTA, 0.05% Tween 20, 0.5% SDS (pH 8.0). 2. 0.1 BWT buffer: 0.1 M NaCl, 10 mM Tris–HCl, 1 mM EDTA, 0.05% Tween 20 (pH 8.0). 3. Stringency wash buffer: 0.1 SSC (Sigma-Aldrich), 0.1% SDS. 4. TT buffer: 10 mM Tris–HCl, 0.05% Tween 20 (pH 8.0). 5. TE buffer: 10 mM Tris–HCl, 1 mM EDTA (pH 8.0). 6. TET buffer: 10 mM Tris–HCl, 1 mM EDTA, 0.05% Tween 20 (pH 8.0).

2.2 Adapter Decontamination and Hybridization

1. CL78 adapter decontamination: In a 0.2 mL PCR tube, combine 12 μL water, 4 μL 100 μM CL78, 2 μL 10 T4 RNA ligation buffer (New England Biolabs), 1 μL 10 U/μL Klenow fragment (ThermoFisher Scientific), and 1 μL 10 U/μL T4 polynucleotide kinase (ThermoFisher Scientific). Total reaction volume is 20 μL. Mix the reagents properly, spin the tube briefly in a microcentrifuge, and incubate the reaction in a thermal cycler for 20 min at 37  C followed by 1 min at 95  C (enzyme inactivation). The final concentration of CL78 is 20 μM.

Single-Stranded DNA Library Preparation

79

2. TL136 splinter decontamination: In a 0.2 mL PCR tube, combine 8 μL water, 8 μL 100 μM TL136, 2 μL 10 T4 RNA ligation buffer, 1 μL 10 U/μL Klenow fragment, and 1 μL 10 U/μL T4 polynucleotide kinase. Total reaction volume is 20 μL. Mix reagents properly, spin the tube briefly in a microcentrifuge, and incubate the reaction in a thermal cycler for 20 min at 37  C followed by 1 min at 95  C (enzyme inactivation). The final concentration of TL136 is 40 μM. 3. Adapter/splinter hybridization: In a 0.2 mL PCR tube, combine 20 μL purified adapter CL78 with 20 μL purified splinter TL136 for a total volume of 40 μL. Heat the reaction mix for 10 s to 95  C in a thermal cycler and cool down to 10  C. The final concentration of hybridized adapter CL78/TL136 is 10/20 μM. Store at 20  C until used. 4. Preparation of double-stranded adapter CL53/73: In a 0.2 mL PCR tube, combine 9.5 μL TE, 0.5 μL 5 M NaCl, 20 μL 500 μM CL53, and 20 μL 500 μM CL73. Total volume is 50 μL. Mix and incubate the reaction for 10 s at 95  C in a thermal cycler, and cool down to 14  C at a rate of 0.1  C/s. Add 50 μL TE to obtain 100 μM CL53/73 in a final volume of 100 μL.

3

Methods

3.1 Heat Denaturation, Dephosphorylation, and Ligation of First Adapter

1. For each sample, combine the following reagents in a 0.5 mL tube (see Note 1): 8 μL 10 T4 RNA ligation buffer, 2 μL 2% Tween 20, 1 μL 1 U/μL FastAP (ThermoFisher Scientific), and sample DNA supplemented with TT buffer to 34.6 μL. Total reaction volume is 45.6 μL. Mix properly and briefly spin the tubes in a microcentrifuge. Incubate for 10 min at 37  C and 2 min at 95  C in a thermal cycler. Transfer the tubes directly from the thermal cycler into an ice water bath, and let them stand for 2 min. 2. Add the following components to the reaction mix to obtain a total reaction volume of 80 μL: 32 μL 50% PEG-8000, 0.4 μL 100 mM ATP, 1 μL 10/20 μM CL78/TL136, and 1 μL 30 U/ μL T4 DNA ligase (ThermoFisher Scientific) (see Notes 2 and 3). Mix the reactions properly (see Note 4), and spin the tubes briefly in a microcentrifuge. Incubate for 1 h at 37  C and 1 min at 95  C in a thermal cycler. Transfer the tubes directly from the thermal cycler into an ice water bath. Proceed to the next step immediately or freeze the tubes at 20  C.

3.2 Immobilize Ligation Products on Beads

1. Resuspend the stock solution of MyOne C1 beads (ThermoFisher Scientific) by vortexing. For each reaction, transfer 20 μL of the bead suspension into a 1.5 mL tube (multiply by

80

Marie-Theres Gansauge and Matthias Meyer

the number of samples, e.g., 120 μL for six samples). Wash beads twice with 500 μL 0.1 BWT + SDS. Resuspend the beads in 250 μL 0.1 BWT + SDS (multiply by the number of samples, e.g., 1.5 mL for six samples). Per sample, transfer 250 μL beads to a 1.5 mL tube. 2. If the reactions were frozen at the end of step 2, Subheading 3.1, thaw the tubes and incubate them for 1 min at 95  C in a thermal cycler. Transfer the tubes directly from the cycler into an ice water bath, and let them stand for 2 min. Add the reaction mix to the bead suspension, vortex and rotate the bead suspension for 20 min at room temperature. Spin the tubes briefly in a microcentrifuge. 3. Pellet the beads using a magnet rack. Pipette off and discard the supernatant. Add 200 μL 0.1 BWT + SDS and resuspend the beads by vortexing. Spin the tubes briefly in a microcentrifuge, place on magnetic rack and discard the supernatant. Add 100 μL Stringency wash buffer and resuspend the beads by vortexing. Incubate the tubes for 3 min at 45  C with interval mixing every 30 s. Spin the tubes briefly in a microcentrifuge, place them on a magnetic rack, and discard the supernatant. Add 200 μL 0.1 BWT and resuspend the beads by vortexing. 3.3 Primer Annealing and Extension

1. For each reaction, combine the following reagents in a 1.5 mL tube: 39.1 μL water, 5 μL 10 Klenow reaction buffer (ThermoFisher Scientific), 0.4 μL 25 mM dNTP, 2.5 μL 1% Tween 20, and 1 μL 100 μM CL130. Total volume is 48 μL. 2. Spin the tubes with the bead suspension in a microcentrifuge, and place them on a magnetic rack. Discard the supernatant, and resuspend the beads in the reaction mix from step 1, Subheading 3.3 by vortexing. Incubate the tubes for 2 min in a thermoshaker pre-heated to 65  C. Place the tubes into an ice water bath. Add 2 μL 10 U/μL Klenow fragment (ThermoFisher Scientific), and briefly mix the reactions by vortexing. Incubate the tubes in a thermoshaker for 5 min at 25  C, followed by 25 min at 35  C with interval mixing every 30 s. 3. Spin the tubes in a microcentrifuge. Perform three bead washes exactly as described in step 3, Subheading 3.2.

3.4 Ligation of Second Adapter, Library Elution

1. For each reaction, combine the following reagents in a 1.5 mL tube: 73.5 μL water, 10 μL 10 T4 DNA ligase buffer (ThermoFisher Scientific) (see Note 5), 10 μL 50% PEG 4000, 2.5 μL 1% Tween 20, 2 μL 100 μM CL53/73, and 2 μL 5 U/μL T4 DNA ligase (ThermoFisher Scientific). Total reaction volume is 100 μL. 2. Spin the tubes with the bead suspension in a microcentrifuge. Place the tubes on a magnetic rack and discard the supernatant.

Single-Stranded DNA Library Preparation

81

Resuspend the beads in the ligation mix from step 1. Incubate the reaction for 1 h at 22  C with interval mixing/vortexing every 30 s. 3. Spin the tubes in a microcentrifuge. Perform three bead washes exactly as described in step 3, Subheading 3.2. 4. Spin the tubes briefly in a microcentrifuge, place them on a magnetic rack, and discard the supernatant. Resuspend the beads in 50 μL TT buffer by vortexing, and transfer the bead suspension to 0.2 mL PCR strip tubes. Spin the tubes briefly in a microcentrifuge. Incubate the bead suspension for 1 min at 95  C. Immediately transfer the PCR strip tubes to a magnetic rack, and transfer the supernatant (the library) to fresh 1.5 mL tubes. 3.5 Library Quantification by qPCR

1. Dilute 1 μL of each library 50-fold in TET buffer (see Note 6). For each reaction (samples and qPCR standard, see Note 7), combine the following reagents in the wells of a 96-well PCR plate: 10.5 μL water, 12.5 μL 2 Maxima SYBR Green qPCR Master Mix (ThermoFisher Scientific), 0.5 μL 10 μM primer IS7, 0.5 μL 10 μM primer IS8, and 1 μL of diluted library/ qPCR standard. Total reaction volume is 25 μL. Close the wells with optical caps, and briefly spin the PCR plate in a centrifuge at 1000  g. Place the plate into a qPCR cycler, and incubate the reactions at 95  C for 10 min, followed by 40 cycles at 95  C for 30 s, 60  C for 30 s, and 72  C for 30 s. Carry out fluorescence measurements at the end of each extension step. 2. Use the software provided with the qPCR system to calculate the number of molecules in each library (see Note 8).

3.6 Amplification and Indexing of the Sequencing Libraries

1. For each reaction, combine the following reagents in 0.2 mL PCR tubes: 20 μL water, 10 μL 10 AccuPrime Pfx buffer (ThermoFisher Scientific), 10 μL 10 μM P7 indexing primer, 10 μL 10 μM P5 indexing primer (see Note 9), 49 μL library, and 1 μL 2.5 U/μL AccuPrime Pfx DNA polymerase (ThermoFisher Scientific). Total reaction volume is 100 μL. Mix the reaction properly and spin the tubes in a microcentrifuge. Incubate the reactions in a thermal cycler at 95  C for 2 min, followed by an appropriate number of cycles (see Note 10) at 95  C for 20 s, 60  C for 30 s, and 68  C for 1 min, with a final extension step of 5 min at 68  C. 2. Purify the amplified libraries using the MinElute PCR purification kit (Qiagen). Elute the purified products in 30 μL TE buffer. Determine the concentration using a DNA-1000 chip on Bioanalyzer 2100 (Agilent Technologies) and sequence the libraries on an Illumina MiSeq or HiSeq instrument using CL72 as the primer for the forward insert read (see Fig. 2).

82

4

Marie-Theres Gansauge and Matthias Meyer

Notes 1. Choose 0.2 mL tubes if the thermal cycler cannot fit 0.5 mL tubes. At least one negative control containing water instead of sample DNA should be included in each experiment. We also recommend the addition of a positive control, e.g., using 0.1 pmol of oligonucleotide CL104. 2. PEG-8000 is highly viscous and requires very thorough mixing when preparing a master mix. 3. T4 DNA ligase is used in two different concentrations throughout this protocol: 30 U/μL for single-stranded ligation step 2, Subheading 3.1 and 5 U/μL for double-stranded ligation step 4, Subheading 3.4. 4. Proper mixing in this step is critical to the success of the library preparation. Vortexing is ineffective due to the high viscosity of PEG-8000. Mix by flicking the tubes with a finger several times, and control the success of mixing by eye. 5. White precipitate may be present in the ligation buffer after thawing. Heat the buffer vial briefly to 37  C and vortex until the precipitate has dissolved. 6. Save the library dilution for repeated measurements in case they become necessary. Store at 20  C. 7. For the preparation of a qPCR standard with Illumina adapters, use pUC19 DNA and the primer pair CL105/CL106 to generate a 122-bp PCR product. Purify the product using the MinElute PCR Purification Kit. Determine its concentration and prepare a tenfold dilution series of the PCR product in TET buffer, ranging from 109 to 102 copies of the PCR product per microliter [8]. 8. Library negative controls are used to infer the number of artifact molecules that have formed during library preparation (typically less than 3  108 molecules). Order newly synthesized batches of CL78 and TL136 if artifacts are formed in greater number. 9. Sequences of indexing primers are published in Gansauge and Meyer [8]. 10. Optimal cycle numbers for amplification can be inferred individually for each sample using the qPCR amplification plots. Cycling into PCR plateau leads to the formation of undesired heteroduplexes and should be avoided [8].

Single-Stranded DNA Library Preparation

83

References 1. Meyer M, Kircher M, Gansauge MT, Li H, Racimo F, Mallick S, Schraiber JG, Jay F, Prufer K, de Filippo C, Sudmant PH, Alkan C, Fu Q, Do R, Rohland N, Tandon A, Siebauer M, Green RE, Bryc K, Briggs AW, Stenzel U, Dabney J, Shendure J, Kitzman J, Hammer MF, Shunkov MV, Derevianko AP, Patterson N, Andres AM, Eichler EE, Slatkin M, Reich D, Kelso J, Paabo S (2012) A high-coverage genome sequence from an archaic Denisovan individual. Science 338 (6104):222–226 2. Fu Q, Li H, Moorjani P, Jay F, Slepchenko SM, Bondarev AA, Johnson PL, Aximu-PetriA, Prufer K, de Filippo C, Meyer M, Zwyns N, Salazar-Garcia DC, Kuzmin YV, Keates SG, Kosintsev PA, Razhev DI, Richards MP, Peristov NV, Lachmann M, Douka K, Higham TF, Slatkin M, Hublin JJ, Reich D, Kelso J, Viola TB, Paabo S (2014) Genome sequence of a 45,000-year-old modern human from western Siberia. Nature 514 (7523):445–449 3. Prufer K, Racimo F, Patterson N, Jay F, Sankararaman S, Sawyer S, Heinze A, Renaud G, Sudmant PH, de Filippo C, Li H, Mallick S, Dannemann M, Fu Q, Kircher M, Kuhlwilm M, Lachmann M, Meyer M, Ongyerth M, Siebauer M, Theunert C, Tandon A, Moorjani P, Pickrell J, Mullikin JC, Vohr SH, Green RE, Hellmann I, Johnson PL, Blanche H, Cann H, Kitzman JO, Shendure J, Eichler EE, Lein ES, Bakken TE, Golovanova LV, Doronichev VB, Shunkov MV, Derevianko AP, Viola B, Slatkin M, Reich D, Kelso J, Paabo S (2014) The complete genome sequence of a Neanderthal from the Altai Mountains. Nature 505(7481):43–49 4. Sawyer S, Renaud G, Viola B, Hublin JJ, Gansauge MT, Shunkov MV, Derevianko AP, Prufer K, Kelso J, Paabo S (2015) Nuclear and mitochondrial DNA sequences from two

Denisovan individuals. Proc Natl Acad Sci U S A 112(51):15696–15700 5. Meyer M, Arsuaga JL, de Filippo C, Nagel S, Aximu-Petri A, Nickel B, Martinez I, Gracia A, Bermudez de Castro JM, Carbonell E, Viola B, Kelso J, Prufer K, Paabo S (2016) Nuclear DNA sequences from the Middle Pleistocene Sima de los Huesos hominins. Nature 531 (7595):504–507. https://doi.org/10.1038/ nature17405 6. Bennett EA, Massilani D, Lizzo G, Daligault J, Geigl EM, Grange T (2014) Library construction for ancient genomics: single strand or double strand? Biotechniques 56(6):289–290. 292–286, 298, passim 7. Wales N, Caroe C, Sandoval-Velasco M, Gamba C, Barnett R, Samaniego JA, Madrigal JR, Orlando L, Gilbert MT (2015) New insights on single-stranded versus doublestranded DNA library preparation for ancient DNA. BioTechniques 59(6):368–371 8. Gansauge MT, Meyer M (2013) Singlestranded DNA library preparation for the sequencing of ancient or damaged DNA. Nat Protoc 8(4):737–748 9. Korlevic P, Gerber T, Gansauge MT, Hajdinjak M, Nagel S, Aximu-Petri A, Meyer M (2015) Reducing microbial and human contamination in DNA extractions from ancient bones and teeth. BioTechniques 59(2):87–93 10. Marie-Theres Gansauge TG, Glocke I, Korlevic´ P, Lippik L, Sarah Nagel LR, Schmidt A, Meyer M (2016) Single-stranded DNA library preparation from highly degraded DNA using T4 DNA ligase. Nucleic Acids Res 45(10):e79 11. Kwok CK, Ding Y, Sherlock ME, Assmann SM, Bevilacqua PC (2013) A hybridization-based approach for quantitative and low-bias singlestranded DNA ligation. Anal Biochem 435 (2):181–186

Chapter 10 Sequencing Library Preparation from Degraded Samples for Non-illumina Sequencing Platforms Renata F. Martins, Marie-Louise Kampmann, and Daniel W. Fo¨rster Abstract Efficient methods for building genomic sequencing libraries from degraded DNA have been in place for Illumina sequencing platforms for some years now, but such methods are still lacking for other sequencing platforms. Here, we provide a protocol for building genomic libraries from degraded DNA (archival or ancient sample material) for sequencing on the Ion Torrent™ high-throughput sequencing platforms. In addition to a reduction in time and cost in comparison to commercial kits, this protocol removes purification steps prior to library amplification, an important consideration for work involving historical samples. Libraries prepared using this method are appropriate for either shotgun sequencing or enrichment-based downstream approaches. Key words Archival DNA, Degraded samples, Ion Torrent™ PGM , Non-illumina platforms, Sequencing libraries

1

Introduction DNA obtained from archival or ancient samples is often highly degraded due to postmortem fragmentation of DNA strands, rendering more traditional PCR-based methods to recover DNA inefficient or ineffective [1]. While high-throughput sequencing technologies can make use of such short DNA fragments, most protocols developed for this purpose focus on the Illumina sequencing platforms and methods optimized for ancient DNA (aDNA) samples have yet to be developed for other high-throughput sequencing platforms. The Ion Torrent™ Personal Genome Machine (PGM) system is a high-throughput sequencing platform that carries out sequencing by synthesis (SBS) using real-time measurement of hydrogen ions released during nucleotide incorporation [2]. The chip is flooded with a single nucleotide solution at a time; each nucleotide added during DNA replication results in a pH change which is then measured by sensors and directly converted into a DNA sequence.

Beth Shapiro et al. (eds.), Ancient DNA: Methods and Protocols, Methods in Molecular Biology, vol. 1963, https://doi.org/10.1007/978-1-4939-9176-1_10, © Springer Science+Business Media, LLC, part of Springer Nature 2019

85

86

Renata F. Martins et al.

An advantage of this sequencing platform is its scalability—the throughput can be adjusted by choosing different capacity chips. For example, a “smaller” chip like the Ion 314™ generates less data (100 Mb of sequencing data) and is therefore suited for a small number of samples, while “large” chips like the Ion 318™ generate up to 2 Gb of sequence data, suitable, for example, for larger sample sets. Here, we present a protocol to build sequencing libraries for the Ion Torrent™ PGM that is suitable for degraded DNA samples. The priority for modifying the commercial protocols for use with aDNA was to reduce template loss. This optimization was achieved by the removal of all purification steps before PCR amplification of final libraries, as the increased template length after adapter ligation ought to be sufficient to pass the recommended minimum length of many standard purification kits [3]. The protocol can also be applied to modern DNA samples, with the advantage of lower costs and time of library preparation compared to the commercial protocol provided by the manufacturer.

2

Materials To avoid external contamination of archival and ancient samples, sequencing libraries should be prepared in a laboratory dedicated to the work with such material (clean lab). Post-amplification handling of sequencing libraries should be performed outside the clean lab, in post-PCR labs [1]. Unless stated otherwise, reagents should be kept at 20  C until usage, and reactions should be set up on ice.

2.1 Oligonucleotides and Amplification Primers as Adapted from the Commercial Kits 2.2

Reagents

See Table 1.

1. Molecular grade water (ddH2O). 2. Ion Xpress™ Plus Fragment Library 200 Kit (Ion Xpress™ kit; Thermo Fischer). 3. AmpliTaq Gold (Thermo Fischer, supplied with 10 AmpliTaq buffer, and 25 mM MgCl2). 4. Oligo Hybridization Buffer (10): 500 mM NaCl, 10 mM Tris–HCl (pH 8.0), 1 mM EDTA (pH 8.0), stored at 4  C [4]. 5. Bovine Serum Albumin (BSA). 6. SPRIselect magnetic beads (Beckman Coulter), stored at 4  C. 7. Molecular grade ethanol (100%), stored at room temperature.

Primer A_amp Primer P1_amp

PGM Barcode 5 30 –50

PGM Barcode 4 30 –50 PGM Barcode 5 50 –30

PGM Barcode 3 30 –50 PGM Barcode 4 50 –30

PGM Barcode 2 30 –50 PGM Barcode 3 50 –30

PGM Barcode 1 30 –50 PGM Barcode 2 50 –30

PGM Barcode 1 50 –30

CCA TCT CAT CCC TGC GTG TC CCA CTA CGC CTC CGC TTT CCT CTC TAT G

CCA TCT CAT CCC T*G*C GTG TCT CCG ACT CAG CTA AGG TAA CGAT ATC GTT ACC TTA GCT GAG TCG GAG ACA CG*C CCA TCT CAT CCC T*G*C GTG TCT CCG ACT CAG TAA GGA GAA CGA T ATC GTT CTC CTT ACT GAG TCG GAG ACA CG*C CCA TCT CAT CCC T*G*C GTG TCT CCG ACT CAG AAG AGG ATT CGA T ATC GAA TCC TCT TCT GAG TCG GAG ACA CG*C CCA TCT CAT CCC T*G*C GTG TCT CCG ACT CAG TAC CAA GAT CGAT ATC GAT CTT GGT ACT GAG TCG GAG ACA CG*C CCA TCT CAT CCC T*G*C GTG TCT CCG ACT CAG CAG AAG GAA CGAT ATC GTT CCT TCT GCT GAG TCG GAG ACA CG*C

CCA CTA CGC CTC CGC TTT CCT CTC TAT GGG CAG TCG GTG AT ATC ACC GAC TGC CCA TAG AGA GGA AAG CGG AGG CGT AGT GG*T*T

Sequencea

*Phosphorothioate bond in the sequence a Bases in bold indicate the barcode sequence; for barcode 6–48, please see ref. [2]

Amplification

Barcoded adapters

0

P1 adapter 30 –50

P1 adapter 5 –3

P1 adapter

0

Component

Step

Table 1 Oligonucleotides necessary for this protocol, as adapted from commercial kits

20 28

30

30 43

30 43

30 43

30 43

43

43

41

Size (nt)

HPLC purification during synthesis

HPLC purification during synthesis HPLC purification during synthesis

HPLC purification during synthesis HPLC purification during synthesis

HPLC purification during synthesis HPLC purification during synthesis

HPLC purification during synthesis HPLC purification during synthesis

HPLC purification during synthesis

HPLC purification during synthesis

HPLC purification during synthesis

Remarks

Library Preparation for Ion Torrent PGM 87

88

Renata F. Martins et al.

8. Ion PGM™ Template OT2 200 kit for template preparation. 9. Ion PGM™ IC 200 kit for library sequencing. 10. Ion PGM™ Chips, which can vary in capacity depending on target size and number of samples, stored at room temperature. 11. (Optional) Pippin Prep 2% Agarose Gel cassette (Sage Science). 2.3

Equipment

1. Thermocyclers, one in the clean lab and one in the modern DNA lab. 2. Microcentrifuge, in the clean lab. 3. Magnetic rack, one in the clean lab and one in the modern DNA lab. 4. Real-time PCR (qPCR) thermal cycler in the modern DNA lab. 5. Ion Torrent™ template preparation equipment: either OneTouch™ or Ion Chef™.

3

Methods It is strongly encouraged to incorporate library blanks during each step of the protocol.

3.1 Adapter and Barcode Preparation

1. Prepare P1 adapter mix as follows (200 μM): Oligo Hybridization Buffer 10

10 μL

P1 adapter 50 –30

40 μL

P1 adapter 30 –50

40 μL

ddH2O

10 μL

2. Prepare Barcoded adapter mix as follows (200 μM): Oligo Hybridization Buffer 10 0

10 μL

0

40 μL

PGM barcode X 30 –50

40 μL

ddH2O

10 μL

PGM barcode X 5 –3

3. Incubate each reaction mix for 10 s at 95  C, and then ramp down from 95  C to 12  C at a rate of 0.1  C/s. 4. Create “ready-to-use” solutions of 25 μM of P1 adapter and of each barcoded adapter by diluting 2.5 μL of product with 17.5 μL of ddH2O.

Library Preparation for Ion Torrent PGM

3.2 Blunt-Ending DNA Fragments

89

For starting DNA concentration and volume modifications, please see Notes 1 and 2. 1. Set up master mix for blunt-end repair. Include per reaction: (a) 5.0 μL of 5 End Repair Buffer (Ion Xpress™ kit). (b) 0.25 μL of Blunt Ending Enzyme (Ion Xpress™ kit). 2. Distribute 5.25 μL of the mix in each labelled PCR tube. 3. Add 19.5 μL of DNA extraction in each individual PCR tube (or bring final volume to 25 μL by adding ddH2O, according with initial template volume). 4. Incubate the mixture in a thermocycler for 20 min at 20  C, and then heat-inactivate the enzyme by incubating for 20 min at 72  C (see Note 3).

3.3

Adapter Ligation

1. Remove tube with blunt-ended DNA from the thermocycler and keep on ice. 2. Prepare adapter ligation reaction mix, including per reaction: (a) 2.5 μL of 10 Ligase Buffer (Ion Xpress™ kit). (b) 1.0 μL of P1 adapter. (c) 0.5 μL of dNTP Mix. (d) 0.5 μL of DNA Ligase. (e) 2.0 μL of Nick Repair Polymerase (Ion Xpress™ kit). 3. Distribute 6.5 μL of the mix into a new PCR tube. 4. Add 1.0 μL of each of the chosen PGM Barcode X into PCR tube of each respective sample. 5. Add 20.5 μL of blunt-ended DNA from Subheading 3.2. 6. Incubate the mixture in a thermocycler for 15 min at 25  C, followed by 5 min at 72  C (see Note 3).

3.4

Bead Purification

This purification step can be performed at room temperature (see Note 4). 1. Prepare a fresh 70% Ethanol solution, with a final volume of at least 500 μL  number of samples. 2. Add 40 μL of SPRIselect beads to the product from Subheading 3.3, for a final ratio of 1.8 (which should result in a broad range of recovered fragment size and upper limit of approx. 200 bp). 3. Mix thoroughly by pipetting, and let the tubes incubate for 1 min in the magnetic rack. 4. Remove supernatant without disturbing the magnetic beads. 5. Add 500 μL of freshly prepared 70% ethanol to each sample, mix well, and let the tubes incubate until all beads are captured by the magnetic rack.

90

Renata F. Martins et al.

6. Remove supernatant without disturbing the magnetic beads, and let them air dry, for a maximum of 5 min. 7. Add 20 μL of ddH2O to elute DNA, mix well, and let the tubes rest for another minute in the magnetic rack. 8. Remove elution and transfer it into a new (labelled) tube, keep on ice until next reaction. 3.5

Amplification

1. Dilute primers by adding 10 μL of primer stock solution (at 100 μM) in 90 μL of ddH2O. 2. For the amplification of libraries, set up a master mix, with per parallel reaction: (a) 7.8 μL of ddH2O. (b) 2.0 μL of AmpliTaqBuffer 10. (c) 0.2 μL of dNTPs (25 mM each). (d) 1.6 μL of MgCl2 (25 mM). (e) 0.2 μL of BSA (10 mg/mL). (f) 1.5 μL of each primer (10 μM each). (g) 0.2 μL of AmpliTaq Gold (5 U/μL). 3. Distribute 15 μL in each PCR tube, and add 5.0 μL of the purified product from Subheading 3.4 (see Note 5). 4. In the modern DNA, lab set up and run the thermocycler with the following protocol: denaturation for 10 min at 94  C, followed by 15 cycles of 30 s at 94  C, 45 s at 60  C, and 45 s at 72  C. Final extension occurs at 72  C for 5 min (see Note 6). 5. Pool all parallel amplifications for the same sample, and repeat Subheading 3.4 with a 1.0 ratio, eluting the sequencing libraries in 20 μL (see Note 7). 6. Visualize library concentration and fragment size distribution with, e.g., Tapestation HS D1000 or 3% agarose gel (see Note 8).

3.6

Sequencing

1. Pool all libraries in equimolar amounts, and quantify the molarity of the complete range of pooled fragments, with, e.g., Tapestation HS1000 (see Note 9). 2. Dilute the library pool to 20–23 pM, and prepare the chips either using OneTouch™ or Ion Chef™. Sequence the loaded chips within 24 h.

4

Notes 1. Commercial protocols for library building provided by Thermo Fischer are optimized for only two initial DNA concentrations

Library Preparation for Ion Torrent PGM

91

(1 μg and 100 ng). Often such high concentrations cannot be obtained from low-quality samples, so we suggest starting with 100 ng or the highest possible volume of extracted DNA. 2. Although starting DNA concentration can vary, do not dilute the initial DNA extraction, but rather add water accordingly to bring the mix up to the final volume. 3. Utilizing heat-inactivation of enzymes allows for the elimination of purification steps prior to amplification. This avoids unnecessary loss of short DNA fragments during purification. 4. We have tested silica column-based purifications (e.g., MinElute, Qiagen) and have concluded that column-based methods are less permissive in retaining smaller fragments, because the columns are optimized for a cutoff of 80 bp. The advantage of bead purification is the possibility to adapt the ratio of beads to use, to have a larger range of fragments recovered. 5. The volumes suggested here are based on four parallel amplifications per sample. If you adjust the number of parallel reactions, you must also adjust volumes of template DNA. 6. Estimate the optimum number of cycles for each sample by qPCR using primers A_amp and P1_amp. It is noteworthy that increasing the number of cycles much above the suggested 15 cycles increases fragment clonality, which in turn decreases the amount of data obtained. 7. After pooling and purifying each library, the final product should be around 10 nM, although lower molarity is also acceptable due to the low starting amount necessary for loading each sequencing chip. If an adapter dimer peak is visible even after purification, include it in the calculations for concentration of the final sequencing pool. 8. Libraries prepared using this method are ready to be sequenced. However, if the fragment size range after pooling is too wide or an adapter/primer dimer is still present in the sequencing pool, it is suggested to include a size selection step (e.g., using Pippin Prep or a gel extraction). This step, while costly, increases sequencing success. 9. Do not dilute each individual library to final concentration, as small volumes are prone to pipetting errors. Dilute each sample to 4 or 2 nM, and pool them in equimolar amounts. After measuring the region molarity, do serial dilutions of the pool until the final concentration of 20–23 pM.

92

Renata F. Martins et al.

References 1. P€a€abo S, Poinar H, Serre D et al (2004) Genetic analyses from ancient DNA. Annu Rev Genet 38 (1):645–679 2. Ion Torrent by Thermo Fischer (2016) Ion Xpress Plus gDNA fragment library preparation—user guide. Thermo Fischer. MAN0009847

3. Taylor PG (1996) Reproducibility of ancient DNA sequences from extinct Pleistocene fauna. Mol Biol Evol 13:283–285 4. Maricic T, Whitten M, P€a€abo S (2010) Multiplexed DNA sequence capture of mitochondrial genomes using PCR products. PLoS One 5(11): e14004

Chapter 11 Whole-Genome Capture of Ancient DNA Using Homemade Baits Gloria Gonza´lez Fortes and Johanna L. A. Paijmans Abstract For many archaeological and paleontological samples, the relative content of endogenous compared to contaminant DNA is low. In such cases, enriching sequencing libraries for endogenous DNA, prior to sequencing can make the final research project more cost-effective. Here, we present an in-solution enrichment protocol based on homemade baits that can be applied to recover complete nuclear genomes from ancient remains. The approach is based on the preparation of DNA baits by biotinylated adapter ligation. The procedure has been developed for use with human remains but can be adapted to other species or target regions by choosing the appropriate template DNA from which to build the capture baits. By using homemade rather than commercially acquired baits, this protocol may offer increased flexibility and cost efficiency. Key words Ancient DNA, Whole-genome target enrichment, Homemade capture baits, Hybridization capture

1

Introduction The field of ancient DNA (aDNA) research has developed rapidly during recent years, mainly due to advances in next-generation sequencing (NGS) coupled with the optimization of methods to extract short and damaged DNA molecules [1, 2]. New protocols for library preparation have improved the conversion of short aDNA molecules into NGS libraries [3, 4]. Optimized sampling strategies have also targeted tissues with reduced levels of contamination, particularly the petrous bone [5, 6] and the dense outer layer of long bones [7]. Altogether, these improvements are leading to an ever-increasing number of ancient nuclear genomes (e.g., [8–12]). Despite these advances, when extracts with high endogenous DNA fractions are not available, for example, due to the age or preservation conditions of the specimen, the recovery of both nuclear and mitochondrial genomic data remains challenging. In

Beth Shapiro et al. (eds.), Ancient DNA: Methods and Protocols, Methods in Molecular Biology, vol. 1963, https://doi.org/10.1007/978-1-4939-9176-1_11, © Springer Science+Business Media, LLC, part of Springer Nature 2019

93

94

Gloria Gonza´lez Fortes and Johanna L. A. Paijmans

such cases, enrichment procedures, or hybridization capture, can be performed prior to sequencing to increase the relative ratio of endogenous or target DNA compared to contaminant DNA and thus strongly reduce the cost of sequencing. The technique is based on complementary binding of the aDNA fragments to DNA or RNA baits with high sequence similarity to the region of interest. Most capture experiments have thus far focused on the recovery of complete mitochondrial genomes [13–15], nuclear exomes [16, 17], or panels of nuclear single nucleotide polymorphisms (SNPs) from ancient human remains [8, 18–20]. For the analysis of human aDNA, array-based capture experiments have produced a large quantity of SNP data covering a wide time range and extensive geographic areas in Europe [18] and the Middle East [21, 22]. An alternative approach is to target entire nuclear genomes, which would increase the number of sequenced polymorphic sites and may be less prone to ascertainment bias than predetermined SNP panels [23]. Here, we describe a method to prepare homemade baits for whole-genome enrichment of ancient human samples. Unlike other protocols that use RNA baits transcribed from genomic DNA libraries [19, 23–25], the protocol presented here uses DNA baits that are built from commercial and purified human DNA ligated to biotinylated adapters [26, 27]. The DNA baits are ligated to capture-specific biotinylated adapters (Table 1) and subsequently hybridized to the template molecules. Bait and target are then immobilized on magnetic streptavidin-coated beads for purification, during which unhybridized molecules (i.e., off-target DNA) are washed away. After the washing steps, the endogenous libraries are eluted from the magnetic beads and amplified for sequencing. The use of DNA baits in the protocol presented here eliminates the transcription step from DNA into RNA baits, making this protocol shorter and easier to perform than other protocols. Furthermore, the library is hybridized with adapter-specific blocking oligos and COT-I DNA at 37  C [28] prior to adding the capture baits for the bait-target hybridization at 65  C. The protocol described below is designed for whole-genome enrichment for single-stranded libraries from ancient human samples [3]. However, with the appropriate blocking adapters, PCR primers, and/or bait DNA, the protocol can be adapted for use in different library preparation methods and different targets, such as capturing mitochondrial genomes from doublestranded libraries.

Whole-Genome Capture using Homemade Baits

95

Table 1 Sequences for all the oligonucleotides required for the hybridization capture experiment (adapted from [26, 29]; see also Chapters 8, 9, and 14) (see Notes 11 and 12) Name

Sequence

Comments

Bio-T

Biotin-TCAAGGACATCC*G

From [26]* indicates a PTO-bond

B

CGGATGTCCTT*G

From [26]* indicates a PTO-bond

Bait preparation

Amplification primers IS5

AAT GAT ACG GCG ACC ACC GA

IS6

CAA GCA GAA GAC GGC ATA CGA

IS7

ACA CTC TTT CCC TAC ACG AC

IS8

GTG ACT GGA GTT CAG ACG TGT

CL72 (custom R1 sequencing primer for single-stranded libraries)

ACA CTC TTT CCC TAC ACG ACG CTC TTC C

(See Note 11)

Gesaffelstein (custom Index2 sequencing primer for single-stranded libraries)

GGA AGA GCG TCG TGT AGG GAA AGA GTG T

(See Note 11)

BO_P5-SS_ext_F

ATC TCG TAT GCC GTC TTC TGC TTG-Pho

30 -end phosphate (see Note 12)

BO_P5-SS_ext_R

CAA GCA GAA GAC GGC ATA CGA GAT-Pho

30 -end phosphate (see Note 12)

BO_P5-SS_trunc_F

ACA CTC TTT CCC TAC ACG ACG CTC TTC C-Pho

30 -end phosphate (see Note 12)

BO_P5-SS_trunc_R

G GAA GAG CGT CGT GTA GGG AAA 30 -end phosphate GAG TG-Pho (see Note 12)

BO_P7_trunc_F

AGA TCG GAA GAG CAC ACG TCT GAA CTC CAG TCA C-Pho

30 -end phosphate (see Note 12)

BO_P7_trunc_R

GTG ACT GGA GTT CAG ACG TGT GCT CTT CCG ATC T-Pho

30 -end phosphate (see Note 12)

BO_P7_ext_F

ATC TCG TAT GCC GTC TTC TGC TTG-Pho

30 -end phosphate (see Note 12)

BO_P7_ext_R

CAA GCA GAA GAC GGC ATA CGA GAT-Pho

30 -end phosphate (see Note 12)

Blocking oligos

96

2

Gloria Gonza´lez Fortes and Johanna L. A. Paijmans

Materials 1. Oligonucleotides required for capture, amplification, and sequencing (Table 1).

2.1

Reagents

1. Oligo hybridization buffer (10): 500 mM NaCl, 10 mM Tris–HCl (pH 8.0), 1 mM EDTA (pH 8.0). 2. B&W buffer: 1 M NaCl, 5 mM Tris–Cl, 0.5 mM EDTA. 3. BWT buffer: 1 M NaCl, 10 mM Tris–Cl, 1 mM EDTA, 0.05% Tween-20, pH 8.0. 4. HW buffer: 200 μl 10 PCR buffer, 200 μl MgCl2 (25 mM), 1.6 μl H2O. 5. TE buffer (e.g., Life Science AMRESCO Tris–EDTA Buffer, pH 7.0). 6. TET buffer: 1 TE buffer, 0.05% Tween. 7. 5 μg of purified genomic human DNA (e.g., Promega). 8. Nuclease free water. 9. D1000 Screen Tape kit. 10. The MinElute PCR purification kit (Qiagen, cat no. 28004): contains Qiagen MinElute purification spin columns, PB binding buffer, PE washing buffer, and EB elution buffer, as well as loading dye and pH indicator (not used). Before use, add ethanol (96–100%) to PE buffer (see bottle label for volume). 11. Buffer Tango (10; Life Technologies). 12. dNTP (25 mM each). 13. ATP (100 mM). 14. T4 polynucleotide kinase (10 U/μl). 15. T4 DNA polymerase (0.1 U/μl). 16. T4 DNA ligase buffer (10). 17. PEG-4000 (50%). 18. T4 DNA ligase (5 U/μl). 19. Human Cot1 (1 μg/μl). 20. Oligo aCGH and ChIP-on-Chip Hybridization Kit (Agilent, cat no. 5188-5220) contains 2 Hi-RPM Hybridization Buffer and 10 Oligo aCGH/ChIP-on-Chip Blocking Agent. The Blocking Agent needs to be resuspended by adding 1350 μl nuclease-free water. Incubate at room temperature for 60 min and mix gently. 21. Dynabeads MyOne C1. Thermo Fisher Scientific, cat. no. 65001. 22. HPLC grade water. 23. NaCl (5 M).

Whole-Genome Capture using Homemade Baits

97

24. Tris–HCl (1 M). 25. EDTA (0.5 M). 26. Herculase buffer (5). 27. Herculase Fusion II polymeraseGeneAmp™ 10 PCR Gold Buffer and MgCl2 (Life Technologies, cat no. 4306898). 28. SYBRGreen qPCR no. 4344463). 2.2

Equipment

Master

Mix

(Thermo

Fisher,

cat

1. Microcentrifuges in both ancient and modern labs (e.g., Labnet Spectrafuge 24D). 2. Quantitative PCR (e.g., PikoReal™ Real-Time PCR System). 3. Covaris S220 Ultrasonicator (or comparable). 4. TapeStation 2200. 5. Qubit 2.0 or other DNA spectrophotometer. 6. Thermocycler. 7. Hybridization oven. 8. Magnetic rack for PCR tubes. 9. Magnetic rack for 1.5 ml tubes. 10. Vortex.

2.3 Preparation of the Biotinylated Adapters (Table 1)

Prepare the biotinylated adapters mix as follows: 1. 12 μl of oligo Bio-T (1000 μM). 2. 12 μl of oligo B (1000 μM). 3. 3 μl  Oligo hybridization buffer (10). 4. 3 μl H2O. 5. Incubate the mix for 5 s at 95  C, followed by a ramp from 95  C to 15  C at a rate of 0.1  C/s. 6. After incubation add 210 μl of nuclease-free water to the mix. 7. Store at 20  C.

3

Methods

3.1 Preparation of Biotinylated Baits (See Note 1)

1. Shear 5 μg of purified human DNA (Promega) to a fragment size range of 100–200 bp. On an ultrasonicator Covaris S220, the shearing conditions are as follows: 7 min at 175 peak incident power (W), 10 duty factor, 200 cycles/burst. Repeat the sonication two times. Check the fragment length of the sheared DNA on a TapeStation with the D1000 DNA Screen Tape kit (Agilent). The average fragment size should be around 150 bp, with a fragment distribution ranging approximately from 50 to 300 bp. If a TapeStation is not available, the size of

98

Gloria Gonza´lez Fortes and Johanna L. A. Paijmans

the sonicated DNA can be tested on a Bioanalyzer or 1–2% agarose gel. 2. Purify the sonicated products on a MinElute (Qiagen) column according to the manufacturer’s instructions, using 25 μl of EB for the final elution. Measure the concentration of the purified products using a Qubit or another DNA spectrophotometer. 3. Blunt-end repair of the sonicated products. Prepare the following reaction mix in a total volume of 40 μl: (a) 4 μl of Buffer Tango oligo (10). (b) 0.2 μl of dNTP (25 mM each). (c) 0.4 μl of ATP (100 mM). (d) 2 μl of T4 polynucleotide kinase (10 U/μl). (e) 0.8 μl of T4 DNA polymerase (0.1 U/μl). (f) H2O up to 15 μl. Vortex the master mix, and add the total volume (15 μl) to 25 μl of purified sonicated products (around 1 μg of sonicated DNA; see Note 2). Incubate in a thermal cycler for 20 min at 25  C, followed by 20 min at 72  C. 4. Prepare the following master mix for the ligation of the biotinylated adapters: (a) 8 μl of T4 DNA ligase buffer (10). (b) 8 μl of PEG-4000 (50%). (c) 4 μl of adapter mix (50 μM). (d) 18 μl of H2O. Vortex the master mix and add 38 μl to the already inactivated reaction from step 3. Mix by vortex and then add 2 μl of T4 DNA ligase (5 U/μl) (see Note 3). Incubate on a thermal cycler at 22  C for 30 min (see Note 4). 5. After the incubation, purify the ligated products through MinElute columns. As the total volume of the ligation reaction is too high to be purified in just one column, the total volume can be halved and purified in two MinElute columns. Elute each column with 25 μl of EB, and then pool the eluted volumes, reaching a total of 50 μl. 6. Verify the purified library in the TapeStation using the D1000 Screen Tape kit. Because of the ligation of the adapters, the fragment size of the baits should be longer than that of the sonicated DNA. If the incorporation of the adapters was successful, the fragment length distribution should be between 50 and 400 bp. 7. Quantify the concentration of the baits using a Qubit.

Whole-Genome Capture using Homemade Baits

3.2 Hybridization of Libraries and Baits (See Note 5)

99

1. For each reaction, set up the following hybridization master mix in a PCR tube: (a) 10:1 mixture of bait to library (approximately 500 ng of aDNA library should be used; see Notes 6 and 7). (b) 0.6 μl of each blocking oligo (2 μM each). (c) 6 μl of Human Cot1 (0.1 μg/μl) (see Note 8). (d) 6 μl blocking agent (1). (e) 30 μl HI-RPM hybridization buffer (1). (f) H2O to a total volume of 60 μl accounting for the library and bait volumes (H2O, bait and library volumes together should be 13.2 μl). Add the components in the order listed, to prevent the precipitation of the DNA. 2. Incubate in a thermal cycler for 3 min at 95  C. 3. Pre-cool a thermal cycler to 37  C. After the denaturation in step 2, transfer the tubes to this thermal cycler at 37  C, and incubate for 30 min. This step will allow the blocking oligos and Human Cot1 to hybridize with the library. 4. Ten minutes before the incubation from step 3 is over, preheat a thermal cycler to 95  C. Incubate the baits in this thermal cycler at 95  C for 4 min. 5. Quickly transfer the denatured baits to a cold block, and add the corresponding volume to have approximately 50 ng of baits to each hybridization mix from step 3. Mix gently. 6. Incubate the baits and the hybridization mix in a thermal cycler at 65  C for 48 h (see Note 9).

3.3 Preparation of Streptavidin-Coated Beads

1. Always prepare a fresh aliquot of streptavidin-coated beads on the day they are used. 2. Resuspend the stock of Dynabeads MyOne C1 by vortexing vigorously. 3. For each sample, transfer 18 μl of bead suspension into a 2 ml low retention tube. 4. Pellet the beads using a magnetic rack and discard the supernatant. 5. Add 1 ml of B&W buffer and resuspend. 6. Pellet the beads using a magnetic rack and discard the supernatant. 7. Repeat for a total of three washes. 8. Resuspend the beads in the same volume of B&W buffer as in step 3 (i.e., 18 μl per sample).

100

Gloria Gonza´lez Fortes and Johanna L. A. Paijmans

3.4 Immobilization of the Hybridized Libraries and Washing

1. Warm up the beads aliquots to the hybridization temperature (65  C) for at least 2 min in the hybridization oven. 2. Incubate each hybridization reaction from Subheading 3.2, step 6 with 18 μl of the magnetic streptavidin-coated beads under constant rotation at 22  C (or alternatively invert the tubes every 5 min at room temperature) for 40 min. 3. Pellet the beads using a magnetic rack and discard the supernatant, which will contain the off-target DNA. 4. Wash the beads four times using 100 μl of BWT buffer (see Note 10). 5. Pellet the beads in a magnetic rack and discard the supernatant. Wash the beads once in 100 μl of preheated HW buffer and incubate at 50  C for 2 min in a thermal cycler. 6. Pellet the beads in a magnetic rack and discard the supernatant. Wash the beads once in 100 μl of BWT buffer. 7. Pellet the beads using a magnetic rack and discard the supernatant. Wash the beads with 100 μl of TE buffer. 8. Pellet the beads using a magnetic rack and discard the supernatant. Add 30 μl of TET buffer and resuspend by vortexing. 9. Incubate the bead suspensions for 5 min at 95  C in a thermal cycler with a heated lid. After incubation, place the tubes quickly in a magnetic rack. Transfer the supernatant, which contains the captured libraries to a clean tube. This is a pause point, as the captured products can be stored safely at 20  C.

3.5 Amplification of the Captured Libraries

1. In order to estimate the number of cycles for the amplification of the captured libraries, it is convenient to use a qPCR approach. Prepare a 1:20 dilution of each captured library, and use at least three qPCR replicates per sample. If a qPCR machine is not available, use approximately 15–20 cycles for the amplification of the captured libraries. 2. Set up the qPCR reactions in 10 μl volumes, using 1 μl of 1:20 diluted library as template and 9 μl of the following master mix: (a) 5 μl of SYBRGreen Master Mix. (b) 0.2 μl of primer IS5 (0.2 μM). (c) 0.2 μl of primer IS6 (0.2 μM). (d) 3.6 μl H2O to bring the volume up to 10 μl. 3. Set up the amplification reactions, using 30 μl of the eluted capture product as template and 30 μl of the following master mix in each. (a) 12 μl Herculase buffer (1). (b) 2.4 μl of primer IS5 (0.4 μM).

Whole-Genome Capture using Homemade Baits

101

(c) 2.4 μl of primer IS6 (0.4 μM). (d) 0.6 μl of dNTP (0.25 mM each). (e) 0.6 μl of Herculase Fusion II polymerase. (f) 5 μl of post-capture library template. (g) H2O up to 60 μl. PCR conditions: Initial denaturation at 95  C for 2 min; estimated number of cycles of 95  C for 30 s, 60  C for 45 min, and 72  C for 45 min; and final extension at 72  C for 3 min. 4. Purify the PCR products of each sample in a single MinElute column. Use 22 μl of EB for the final elution. 5. Check the size of the captured products on the TapeStation, and use a Qubit for the quantification. 3.6 Second Round of Capture (Optional)

1. The amplified enriched libraries can be subject of a second run of capture in order to further increase the proportion of target molecules. Considering a theoretical enrichment of twofold times after the first capture, the ratio of library to bait can be changed from 10:1 to 5:1. For this second run of hybridization starts at Subheading 3.2, step 2 and proceed as above with the exception of adding ~100 ng of baits in Subheading 3.2, step 3 to capture each library.

3.7 Amplification of the Captured Libraries After Second Round of Capture

1. Amplify and quantify the captured libraries as described in Subheadings 3.2–3.5.

4

Notes 1. The protocol to build the sonicated DNA into bait libraries is based on the protocols by [29] and [26] with the modifications proposed by [4]. The protocol can also be adjusted for other targets, e.g., by using PCR products of specific SNPs or mitochondrial DNA as bait template. 2. During the sonication and subsequent purification some of the starting amount of 5 μg DNA will be lost. 3. Adding the ligase to the master mix only after the template has been added will help reduce the formation of adapter dimers. 4. The biotinylated adapters do not have an overhang, so the Bst fill-in step described in [29] is not required. 5. Always include a capture blank, using water instead of the library template and the average volume of bait.

102

Gloria Gonza´lez Fortes and Johanna L. A. Paijmans

6. Due to the high amount of contaminants usually present in aDNA extracts, we expect to have a low percentage of endogenous DNA molecules in the library (potentially less than 1%) in aDNA libraries. To compensate for this, we add an excess of library compared to baits during the hybridization process. We use a ratio 10:1 of library to bait, which would correspond to 1:1 of the target DNA to bait assuming 10% endogenous DNA. 7. Based on [26], only around 30% of the library-bait hybrids will bind to the magnetic beads, and 1 μl of beads can bind 20–25 ng of DNA as maximum. Thus, to saturate 5 μl of beads we need around 400–500 ng of the bait-library products. 8. We use Human COT1 (1 mg/ml Life Technologies) to mask repetitive regions in the human genome. We perform hybridization of the library with the blocking oligos and Human Cot1 before adding the baits. 9. Suitable PCR tubes should be selected for hybridization. The lids should make a tight seal to avoid evaporation during incubation. 10. Washing the product involves the following steps: pellet the beads in a magnetic rack, discard supernatant, then take the tubes out of the magnetic rack and add the buffer to wash the beads by pipetting up and down. 11. When the single-stranded library preparation by [3] is used for library preparation, custom sequencing primers are required for the R1, and Index2 reads may be required for sequencing [30]. 12. The P5 blocking oligos presented here are compatible with the single-stranded, dual-indexed library preparation by [3]. If a double-stranded library preparation protocol was used, these two can be replaced by: BO_P5-DS_trunc_F: ACA CTC TTT CCC TAC ACG ACG CTC TTC CGA TCT-Pho and BO_P5DS_trunc_R: AGA TCG GAA GAG CGT CGT GTA GGG AAA GAG TG-Pho. References 1. Dabney J, Meyer M, P€a€abo S (2013) Ancient DNA Damage. Cold Spring Harb Perspect Biol 5:a012567. https://doi.org/10.1101/ cshperspect.a012567 2. Gamba C, Hanghøj K, Gaunitz C, Alfarhan AH, Alquraishi SA, Al-Rasheid KAS, Bradley DG, Orlando L (2016) Comparing the performance of three ancient DNA extraction methods for high-throughput sequencing. Mol Ecol Resour 16:459–469. https://doi.org/10. 1111/1755-0998.12470

3. Gansauge M-T, Meyer M (2013) Singlestranded DNA library preparation for the sequencing of ancient or damaged DNA. Nat Protoc 8:737–748. https://doi.org/10.1038/ nprot.2013.038 4. Fortes GG, Paijmans JLA (2015) Analysis of whole mitogenomes from ancient samples. arXiv:150305074 [q-bio] 5. Gamba C, Jones ER, Teasdale MD, McLaughlin RL, Gonzalez-Fortes G, Mattiangeli V, ˝va´ri I, Pap I, Anders A, Domboro´czki L, Ko

Whole-Genome Capture using Homemade Baits Whittle A, Dani J, Raczky P, Higham TFG, Hofreiter M, Bradley DG, Pinhasi R (2014) Genome flux and stasis in a five millennium transect of European prehistory. Nat Commun 5:5257. https://doi.org/10.1038/ ncomms6257 6. Pinhasi R, Fernandes D, Sirak K, Novak M, Connell S, Alpaslan-Roodenberg S, Gerritsen F, Moiseyev V, Gromov A, Raczky P, Anders A, Pietrusewsky M, Rollefson G, Jovanovic M, Trinhhoang H, Bar-Oz G, Oxenham M, Matsumura H, Hofreiter M (2015) Optimal ancient DNA yields from the inner ear part of the human petrous bone. PLoS One 10:e0129102. https://doi.org/10.1371/journal.pone. 0129102 7. Alberti F, Gonzalez J, Paijmans JLA, Basler N, Preick M, Henneberger K, Trinks A, Rabeder G, Conard NJ, Mu¨nzel SC, Joger U, Fritsch G, Hildebrandt T, Hofreiter M, Barlow A (2018) Optimized DNA sampling of ancient bones using Computed Tomography scans. Mol Ecol Resour. https://doi.org/10.1111/ 1755-0998.12911 8. Lazaridis I, Patterson N, Mittnik A, Renaud G, Mallick S, Kirsanow K, Sudmant PH, Schraiber JG, Castellano S, Lipson M, Berger B, Economou C, Bollongino R, Fu Q, Bos KI, Nordenfelt S, Li H, de Filippo C, Pru¨fer K, Sawyer S, Posth C, Haak W, Hallgren F, Fornander E, Rohland N, Delsate D, Francken M, Guinet J-M, Wahl J, Ayodo G, Babiker HA, Bailliet G, Balanovska E, Balanovsky O, Barrantes R, Bedoya G, Ben-Ami H, Bene J, Berrada F, Bravi CM, Brisighelli F, Busby GBJ, Cali F, Churnosov M, Cole DEC, Corach D, Damba L, van Driem G, Dryomov S, Dugoujon J-M, Fedorova SA, Gallego Romero I, Gubina M, Hammer M, Henn BM, Hervig T, Hodoglugil U, Jha AR, Karachanak-Yankova S, Khusainova R, Khusnutdinova E, Kittles R, Kivisild T, Klitz W, Kucˇinskas V, Kushniarevich A, Laredj L, Litvinov S, Loukidis T, Mahley RW, Melegh B, Metspalu E, Molina J, Mountain J, N€akk€al€aj€arvi K, Nesheva D, Nyambo T, Osipova L, Parik J, Platonov F, Posukh O, Romano V, Rothhammer F, Rudan I, Ruizbakiev R, Sahakyan H, Sajantila A, Salas A, Starikovskaya EB, Tarekegn A, Toncheva D, Turdikulova S, Uktveryte I, Utevska O, Vasquez R, Villena M, Voevoda M, Winkler CA, Yepiskoposyan L, Zalloua P, Zemunik T, Cooper A, Capelli C, Thomas MG, Ruiz-Linares A, Tishkoff SA, Singh L, Thangaraj K, Villems R, Comas D, Sukernik R, Metspalu M, Meyer M, Eichler

103

EE, Burger J, Slatkin M, P€a€abo S, Kelso J, Reich D, Krause J (2014) Ancient human genomes suggest three ancestral populations for present-day Europeans. Nature 513:409–413. https://doi.org/10.1038/nature13673 9. Jones ER, Gonzalez-Fortes G, Connell S, Siska V, Eriksson A, Martiniano R, McLaughlin RL, Llorente MG, Cassidy LM, Gamba C, Meshveliani T, Bar-Yosef O, Mu¨ller W, BelferCohen A, Matskevich Z, Jakeli N, Higham TFG, Currat M, Lordkipanidze D, Hofreiter M, Manica A, Pinhasi R, Bradley DG (2015) Upper Palaeolithic genomes reveal deep roots of modern Eurasians. Nat Commun 6:8912. https://doi.org/10.1038/ ncomms9912 10. Jones ER, Zarina G, Moiseyev V, Lightfoot E, Nigst PR, Manica A, Pinhasi R, Bradley DG (2017) The Neolithic transition in the Baltic was not driven by admixture with early European farmers. Curr Biol 27:576–582. https://doi.org/10.1016/j.cub.2016.12.060 11. Gonza´lez-Fortes G, Jones ER, Lightfoot E, Bonsall C, Lazar C, Grandal-d’Anglade A, Garralda MD, Drak L, Siska V, Simalcsik A, Boroneant¸ A, Romanı´ JRV, Rodrı´guez MV, Arias P, Pinhasi R, Manica A, Hofreiter M (2017) Paleogenomic evidence for multigenerational mixing between Neolithic farmers and Mesolithic hunter-gatherers in the lower Danube basin. Curr Biol 27:1801–1810.e10. https://doi.org/10.1016/j.cub.2017.05.023 12. Cassidy LM, Martiniano R, Murphy EM, Teasdale MD, Mallory J, Hartwell B, Bradley DG (2016) Neolithic and Bronze Age migration to Ireland and establishment of the insular Atlantic genome. Proc Natl Acad Sci 113:368–373. https://doi.org/10.1073/pnas.1518445113 13. Paijmans JLA, Barnett R, Gilbert MTP, Zepeda-Mendoza ML, Reumer JWF, de VJ, Zazula G, Nagel D, Baryshnikov GF, Leonard JA, Rohland N, Westbury MV, Barlow A, Hofreiter M (2017) Evolutionary history of Saber-toothed cats based on ancient mitogenomics. Curr Biol 27:3330–3336.e5. https:// doi.org/10.1016/j.cub.2017.09.033 14. Fortes GG, Grandal-d’Anglade A, Kolbe B, Fernandes D, Meleg IN, Garcı´a-Va´zquez A, Pinto-Llona AC, Constantin S, de Torres TJ, Ortiz JE, Frischauf C, Rabeder G, Hofreiter M, Barlow A (2016) Ancient DNA reveals differences in behaviour and sociality between brown bears and extinct cave bears. Mol Ecol 25:4907–4918. https://doi.org/10.1111/ mec.13800 15. Posth C, Renaud G, Mittnik A, Drucker DG, Rougier H, Cupillard C, Valentin F, Thevenet C, Furtw€angler A, Wißing C,

104

Gloria Gonza´lez Fortes and Johanna L. A. Paijmans

Francken M, Malina M, Bolus M, Lari M, Gigli E, Capecchi G, Crevecoeur I, Beauval C, Flas D, Germonpre´ M, van der Plicht J, Cottiaux R, Ge´ly B, Ronchitelli A, Wehrberger K, Grigorescu D, Svoboda J, Semal P, Caramelli D, Bocherens H, Harvati K, Conard NJ, Haak W, Powell A, Krause J (2016) Pleistocene mitochondrial genomes suggest a single major dispersal of non-Africans and a Late Glacial population turnover in Europe. Curr Biol 26:827–833. https://doi.org/10.1016/j.cub.2016.01.037 16. Burbano HA, Hodges E, Green RE, Briggs AW, Krause J, Meyer M, Good JM, Maricic T, Johnson PLF, Xuan Z, Rooks M, Bhattacharjee A, Brizuela L, Albert FW, de la Rasilla M, Fortea J, Rosas A, Lachmann M, Hannon GJ, P€a€abo S (2010) Targeted Investigation of the Neandertal genome by arraybased sequence capture. Science 328:723–725. https://doi.org/10.1126/sci ence.1188046 17. Castellano S, Parra G, Sanchez-Quinto FA, Racimo F, Kuhlwilm M, Kircher M, Sawyer S, Fu Q, Heinze A, Nickel B, Dabney J, Siebauer M, White L, Burbano HA, Renaud G, Stenzel U, Lalueza-Fox C, de la Rasilla M, Rosas A, Rudan P, Brajkovi D, Kucaneljko GI, Shunkov MV, Derevianko AP, Viola B, Meyer M, Kelso J, Andres AM, Paabo S (2014) Patterns of coding variation in the complete exomes of three Neandertals. Proc Natl Acad Sci 111:6666–6671. https:// doi.org/10.1073/pnas.1405138111 18. Haak W, Lazaridis I, Patterson N, Rohland N, Mallick S, Llamas B, Brandt G, Nordenfelt S, Harney E, Stewardson K, Fu Q, Mittnik A, Ba´nffy E, Economou C, Francken M, Friederich S, Pena RG, Hallgren F, Khartanovich V, Khokhlov A, Kunst M, Kuznetsov P, Meller H, Mochalov O, Moiseyev V, Nicklisch N, Pichler SL, Risch R, Rojo Guerra MA, Roth C, Sze´cse´nyi-Nagy A, Wahl J, Meyer M, Krause J, Brown D, Anthony D, Cooper A, Alt KW, Reich D (2015) Massive migration from the steppe was a source for Indo-European languages in Europe. Nature 522:207–211. https://doi. org/10.1038/nature14317 19. Fu Q, Meyer M, Gao X, Stenzel U, Burbano HA, Kelso J, Paabo S (2013) DNA analysis of an early modern human from Tianyuan Cave, China. Proc Natl Acad Sci 110:2223–2227. https://doi.org/10.1073/pnas.1221359110 20. King TE, Fortes GG, Balaresque P, Thomas MG, Balding D, Delser PM, Neumann R, Parson W, Knapp M, Walsh S, Tonasso L, Holt J, Kayser M, Appleby J, Forster P,

Ekserdjian D, Hofreiter M, Schu¨rer K (2014) Identification of the remains of King Richard III. Nat Commun 5:5631. https://doi.org/ 10.1038/ncomms6631 21. Mathieson I, Lazaridis I, Rohland N, Mallick S, Patterson N, Roodenberg SA, Harney E, Stewardson K, Fernandes D, Novak M, Sirak K, Gamba C, Jones ER, Llamas B, Dryomov S, Pickrell J, Arsuaga JL, de Castro JMB, Carbonell E, Gerritsen F, Khokhlov A, Kuznetsov P, Lozano M, Meller H, Mochalov O, Moiseyev V, Guerra MAR, Roodenberg J, Verge`s JM, Krause J, Cooper A, Alt KW, Brown D, Anthony D, Lalueza-Fox C, Haak W, Pinhasi R, Reich D (2015) Genome-wide patterns of selection in 230 ancient Eurasians. Nature 528:499–503. https://doi.org/10.1038/nature16152 22. Lazaridis I, Nadel D, Rollefson G, Merrett DC, Rohland N, Mallick S, Fernandes D, Novak M, Gamarra B, Sirak K, Connell S, Stewardson K, Harney E, Fu Q, Gonzalez-Fortes G, Jones ER, Roodenberg SA, Lengyel G, Bocquentin F, Gasparian B, Monge JM, Gregg M, Eshed V, Mizrahi A-S, Meiklejohn C, Gerritsen F, Bejenaru L, Blu¨her M, Campbell A, Cavalleri G, Comas D, Froguel P, Gilbert E, Kerr SM, Kovacs P, Krause J, McGettigan D, Merrigan M, Merriwether DA, O’Reilly S, Richards MB, Semino O, Shamoon-Pour M, Stefanescu G, Stumvoll M, To¨njes A, Torroni A, Wilson JF, Yengo L, Hovhannisyan NA, Patterson N, Pinhasi R, Reich D (2016) Genomic insights into the origin of farming in the ancient Near East. Nature 536:419–424. https://doi.org/10.1038/nature19310 23. Carpenter ML, Buenrostro JD, Valdiosera C, Schroeder H, Allentoft ME, Sikora M, Rasmussen M, Gravel S, Guille´n S, Nekhrizov G, Leshtakov K, Dimitrova D, Theodossiev N, Pettener D, Luiselli D, Sandoval K, Moreno-Estrada A, Li Y, Wang J, Gilbert MTP, Willerslev E, Greenleaf WJ, Bustamante CD (2013) Pulling out the 1%: wholegenome capture for the targeted enrichment of ancient DNA sequencing libraries. Am J Hum Genet 93:852–864. https://doi.org/10. 1016/j.ajhg.2013.10.002 ´ vila-Arcos MC, Malaspinas A-S, 24. Schroeder H, A Poznik GD, Sandoval-Velasco M, Carpenter ML, Moreno-Mayar JV, Sikora M, Johnson PLF, Allentoft ME, Samaniego JA, Haviser JB, Dee MW, Stafford TW, Salas A, Orlando L, Willerslev E, Bustamante CD, Gilbert MTP (2015) Genome-wide ancestry of 17th-century enslaved Africans from the Caribbean. Proc Natl Acad Sci 112:3669–3673. https://doi.org/10.1073/pnas.1421784112

Whole-Genome Capture using Homemade Baits 25. Olalde I, Schroeder H, Sandoval-Velasco M, Vinner L, Lobo´n I, Ramirez O, Civit S, Garcı´a Borja P, Salazar-Garcı´a DC, Talamo S, Marı´a Fullola J, Xavier Oms F, Pedro M, Martı´nez P, Sanz M, Daura J, Zilha˜o J, Marque`s-Bonet T, Gilbert MTP, Lalueza-Fox C (2015) A common genetic origin for early farmers from Mediterranean Cardial and Central European LBK cultures. Mol Biol Evol 32:3132–3142. https://doi.org/10.1093/molbev/msv181 26. Maricic T, Whitten M, P€a€abo S (2010) Multiplexed DNA sequence capture of mitochondrial genomes using PCR products. PLoS One 5:e14004. https://doi.org/10.1371/ journal.pone.0014004 27. Horn S (2012) Target enrichment via DNA hybridization capture. In: Hofreiter M, Shapiro B (eds) Ancient DNA, 1st edn. Humana Press, Totowa, NJ, pp 177–188

105

28. Hodges E, Rooks M, Xuan Z, Bhattacharjee A, Benjamin Gordon D, Brizuela L, Richard McCombie W, Hannon GJ (2009) Hybrid selection of discrete genomic intervals on custom-designed microarrays for massively parallel sequencing. Nat Protocols 4:960–974. https://doi.org/10.1038/nprot.2009.68 29. Meyer M, Kircher M (2010) Illumina sequencing library preparation for highly multiplexed target capture and sequencing. Cold Spring Harb Protoc 2010:pdb.prot5448. https:// doi.org/10.1101/pdb.prot5448 30. Paijmans JLA, Baleka S, Henneberger K, Taron UH, Trinks A, Westbury MV, Barlow A (2017) Sequencing single-stranded libraries on the Illumina NextSeq 500 platform. arXiv:171111004 [q-bio]

Chapter 12 Generating RNA Baits for Capture-Based Enrichment Noah Snyder-Mackler, Tawni Voyles, and Jenny Tung Abstract Capture-based enrichment techniques have revolutionized genomic analysis of species and populations for which only low-quality or contaminated DNA samples (e.g., ancient DNA, noninvasively collected DNA, environmental DNA) are available. This chapter outlines an optimized laboratory protocol for generating RNA “baits” for genome-wide capture of target DNA from a larger pool of DNA. This method relies on the in vitro transcription of biotinylated RNA baits, which has the dual benefit of eliminating the high cost of synthesizing custom baits and producing a bait set that targets the majority of regions genome-wide. We provide a detailed protocol for the three main steps involved in bait library construction: (1) making a DNA library from a high-quality DNA sample for the organism of interest or a closely related species; (2) using duplex-specific nuclease digestion to reduce the representation of repetitive regions in the DNA library; and (3) performing in vitro transcription of the repetitive region-depleted DNA library to generate biotinylated RNA baits. Where applicable, we include notes and recommendations based on our own experiences. Key words Whole-genome capture, Biotinylated RNA baits, Targeted enrichment, Repetitive regions, Capture-based enrichment, Genome resequencing

1

Introduction Ten years ago, the idea that we could understand how our ancestors once interacted with our extinct hominin cousins was science fiction. The explosive growth of sequencing technology, combined with major advances in statistical and population genetics, has changed this picture entirely. Since the publication of the Neandertal genome in 2010 [1], evolutionary biologists have made astonishing progress in reconstructing the landscape of archaic human populations, including for populations who left no clear trace except through DNA [2, 3]. This progress stems in large part from new computational and statistical methods for inferring historical events. However, it is also a consequence of major technological advances in genome-scale resequencing of low-quality and quantity DNA samples. These approaches are now routinely applied not only for analysis of archaic hominin ancient DNA but also for ancient DNA samples collected from historical human populations

Beth Shapiro et al. (eds.), Ancient DNA: Methods and Protocols, Methods in Molecular Biology, vol. 1963, https://doi.org/10.1007/978-1-4939-9176-1_12, © Springer Science+Business Media, LLC, part of Springer Nature 2019

107

108

Noah Snyder-Mackler et al.

[4] or other species (e.g., woolly mammoths [5]) and for noninvasively collected DNA samples from extant populations, which face similar technical challenges [6]. Specifically, DNA samples from all of these sources are often highly degraded, with intact pieces no more than a few hundred base pairs in length. The DNA fragments that are of primary interest are usually intermixed with a much larger quantity of DNA sequences from other sources (usually microbial), making straightforward shotgun sequencing of the entire DNA pool very expensive and impractical for population-based study designs. For example, an average of only 1.2% of the DNA extracted from 2000 to 4000-year-old human remains comes from the human genome [7]. To solve this problem, investigators have increasingly turned to capture-based techniques, which enrich high-throughput sequencing libraries for the targets of interest and thus can substantially decrease the cost of sequencing whole genomes from poor-quality samples. Originally developed with biomedical applications in mind (e.g., whole exome sequencing, [8]), these methods have now been adapted for use with ancient DNA, environmental DNA, and noninvasively collected samples [6, 7, 9, 10]. One of the most important developments has involved a transition from de novo synthesis of capture “baits,” which requires specialized instrumentation and usually represents only a subset of large eukaryotic genomes, to the synthesis of genome-wide capture baits from existing DNA samples [6, 7]. These latter methods take advantage of the ability of bacteriophage-derived T7 polymerase to transcribe RNA from a double-stranded DNA template, provided the template includes a T7 RNA polymerase recognition site. The resulting RNA baits, which incorporate biotin modifications throughout, are short (5 μg of DNA in 50 μL of nuclease-free water (see Note 8). 6. End repair (blunt end) the DNA by mixing together 10 μL of KAPA End Repair Buffer, 5 μL of KAPA End Repair Enzyme Mix, 50 μL of sheared DNA (from step 4), and 35 μL of nuclease-free water. 7. Incubate the end repair mixture at 20  C for 30 min. (a) Purify the DNA with room temperature AMPure beads as described in step 4; except resuspend the sample in 30 μL of nuclease-free water. 8. Add A overhangs (“A-tail”) to the DNA by mixing together 5 μL of KAPA A-Tailing buffer, 3 μL of KAPA A-Tailing Enzyme, 30 μL of end-repaired DNA from the previous step, and 12 μL of nuclease-free water. 9. Incubate at 30  C for 30 min. 10. Purify the DNA with room temperature AMPure beads as described in step 4, with two exceptions: (1) add 90 μL of the bead slurry to 50 μL of sample, and (2) resuspend the sample in 36 μL of nuclease-free water. 11. To confirm successful adapter ligation, save 1 μL of DNA eluate (from step 10) to run on a Bioanalyzer chip with the adapter-ligated DNA to be generated in step 16 below. 12. Ligate adapters by combining ~35 μL of A-tailed DNA (from step 11), 10 μL of KAPA Ligation Buffer, 1 μL of 25 μM EcoOT7dTV adapters (from Subheading 3.1), and 5 μL of KAPA DNA ligase. 13. Incubate at 20  C for 15 min. 14. Purify the DNA with room temperature AMPure beads as described in step 4, with two exceptions: (1) add 90 μL of the bead slurry to 50 μL of sample, and (2) resuspend the sample in 30 μL of nuclease-free water. 15. To confirm successful adapter ligation, quantify the concentration of the DNA library and run 1 μL of the DNA eluate from step 15 on a Bioanalyzer DNA 1000 chip along with 1 μL of the pre-adapter ligation DNA saved from step 11. Successful ligation results in an increase in mean fragment size of ~84 bp in the post-adapter ligation sample compared to the pre-adapter ligation library (Fig. 2). The concentration of the adapter-ligated DNA should be between 25 and 250 ng/μL (see Note 8).

114

Noah Snyder-Mackler et al.

Fig. 2 Sample bioanalyzer traces of a dsDNA library before (red) and after (blue) successful adapter ligation. Note the shift of ~84 bp in the means of the two distributions (represented by dashed vertical lines), which reflects successful ligation of the EcoT7 adapter (42 bp) on each end of the fragment (see Fig. 1b, c for detailed schematic)

3.3 Digest Ligated DNA Library with Duplex-Specific Nuclease (DSN) to Reduce the Representation of Repetitive Regions

1. Preheat the heating block to 70  C. Place a tube of nucleasefree water, DSN master buffer, and DSN stop solution in the heating block to warm them to 70  C. 2. Aliquot the eluate from Subheading 3.2, step 14 (~30 μL total), into 14 thin-walled PCR tubes. Each tube should contain 2 μL of eluate. 3. For each of the 14 aliquots from step 2 (2 μL of the adapterligated DNA per aliquot), add 1 μL of hybridization buffer and 1 μL of human Cot-1 DNA (see Note 9). 4. Incubate in a thermocycler for 3 min at 98  C, followed by 4 h at 65  C. Ensure that the lid temperature is set at 105  C to minimize evaporation. 5. Add 4 μL of nuclease-free water (pre-warmed to 70  C), 1 μL of DSN master buffer (pre-warmed to 70  C), and 1 μL of DSN enzyme to each of the 14 aliquots. 6. Incubate for 20 min at 65  C. 7. Add 5 μL of DSN stop solution (pre-warmed to 70  C) to each of the 14 aliquots. 8. Purify the DNA with room temperature AMPure beads as described in Subheading 3.2, step 4, with three exceptions: (1) add 36 μL of the bead slurry to 15 μL of sample, (2) incubate the bead-sample mix at room temperature for 15 min (see Note 10), and (3) resuspend the sample in 10 μL of nucleasefree water.

RNA Baits for Capture-Based Enrichment

3.4 Klenow Fill-In of DSN-Digested DNA

115

1. Dilute the EcoOT7dTV_Fwd primer in nuclease-free water by a factor of 1:10 to produce a 15 μM working solution. 2. To each of the 14 tubes of 10 μL DSN-digested DNA (from Subheading 3.3, step 8), add 0.5 μL of the EcoOT7dTV_Fwd primer working solution, 2.5 μL of NEB buffer 2, 0.5 μL of dNTPs, and 10 μL of nuclease-free water. 3. Incubate the tubes in a thermocycler for 2 min at 94  C, followed by a decrease in temperature at 1  C/s until the thermocycler reaches 35  C (this should take 60 s), and hold at 35  C for 2 min. 4. Decrease the temperature at a rate of 0.5  C/s until the thermocycler reaches 25  C (this should take 20 s), and hold at 25  C for 1 min. 5. Quickly spin down the tubes (1–2 s spin is sufficient) to remove any condensation on the side of the tubes. 6. Add 1 μL of Klenow DNA polymerase to each of the 14 tubes (total volume is 24.5 μL). 7. Incubate in a thermocycler at 37  C for 90 min and then at 75  C for 20 min to heat inactivate the enzyme. 8. Prepare a 2% agarose gel (e.g., 200 mL of 1 TAE, 4 g of agarose powder, and 20 μL of SYBR Safe DNA Gel Stain) with 15 wells deep enough to hold 30 μL of liquid. 9. Set the heating block to 55  C. For each aliquot from step 7, mix 5 μL of DNA gel loading dye with all 24.5 μL of DSN-digested DNA, and carefully pipette this mixture into an empty well of the gel. Pipette 2 μL of 100 bp ladder into the 15th well. 10. Run the gel at 100 V for ~20 min to ensure clear separation of the DNA fragments. The goal here is to be able to cleanly excise 200–300 bp fragments; the runtime may need to be optimized. 11. For each aliquot, excise the 200–300 bp region using a clean razor blade. 12. Purify each aliquot using the Zymoclean Gel DNA Recovery Kit, following the manufacturer’s instructions. Elute the DNA in 12 μL of nuclease-free water. 13. Combine all 14 eluates in a 1.5 mL tube. You should have ~150 μL total volume (see Note 8).

3.5 PCR Amplification of DSNDigested DNA

1. Using the gel-purified DSN-digested DNA from Subheading 3.4, step 13, aliquot 4 μL into 36 thin-walled PCR tubes (see Note 11).

116

Noah Snyder-Mackler et al.

2. To each tube, add 25 μL of KAPA HiFi HotStart ReadyMix, 1 μL of EcoOT7 PCR 1 primer, 1 μL of EcoOT7 PCR 2 primer, and 19 μL of nuclease-free water. 3. Incubate the reaction mixtures in a thermocycler for 45 s at 98  C, followed by 16 cycles of 15 s at 98  C, 30 s at 65  C, and 30 s at 72  C. After the 16 cycles, incubate at 72  C for 1 min, and then hold the sample at 4  C. Ensure that the thermocycler lid is set at 105  C for the duration of the PCR. 4. Purify the DNA with room temperature AMPure beads as described in Subheading 3.2, step 4, with two exceptions: (1) add 90 μL of the bead slurry to 50 μL of sample, and (2) resuspend the sample in 30 μL of nuclease-free water. 5. Combine all 36 purified PCR products in a 1.5 mL tube, which should have a total volume of ~1 mL. 6. Quantify the DNA libraries using a Qubit dsDNA HS kit. Estimate the molarity of the sample using the following equation, which assumes an average fragment size of 250 bp: nM ¼ 6.58  N ng/μL, where N is the concentration (ng/μL) of your sample as measured using the Qubit (see Note 8). 3.6 In Vitro Transcription and Purification

1. Adjust the volume of the DNA libraries to yield a concentration of 140 nM. If the starting concentration is too high, dilute with nuclease-free water; if the starting concentration is too low, concentrate further using a vacufuge (see Note 12). 2. Pipette 7.67 μL aliquots of DNA library into 12 thin-walled PCR tubes. Store the excess DNA library at 80  C (see Note 13). 3. For each of the 12 DNA library aliquots, add 2 μL of T7 10 reaction buffer, 2 μL each of T7 ATP Solution, T7 CTP Solution, and T7 GTP Solution, as well as 1.34 μL of T7 UTP Solution, 0.99 μL of biotin-UTP, and 2 μL of T7 Enzyme Mix. 4. Incubate the mixtures in a thermocycler at 37  C for 4 h. 5. Add 1 μL of TURBO DNase to each well. 6. Incubate in a thermocycler at 37  C for 15 min. 7. Purify each resulting biotinylated RNA aliquot with the Ambion MEGAclear Kit using the manufacturer’s instructions. All centrifugation steps should be conducted at 10,000  g. Use the following two-step elution as follows: (a) Preheat 110 μL of elution solution per sample to 95  C. (b) Apply 35 μL of the preheated elution solution to the center of the Filter Cartridge, close the cap of the tube, and centrifuge for 1 min at room temperature (10,000  g) to elute the RNA.

RNA Baits for Capture-Based Enrichment

117

(c) To maximize RNA recovery, repeat this elution procedure with a second preheated 35 μL aliquot of elution solution. Collect the eluate into the same collection/elution tube. (d) These two rounds of elution should result in a total volume of 70 μL. 8. Combine all 12 eluates into a 1.5 mL tube for a total volume of ~840 μL. 9. Save 2 μL to run on a Bioanalyzer at the end of the protocol with the final RNA bait product to confirm successful restriction enzyme digestion. 10. Quantify the RNA using a Qubit RNA High Sensitivity kit. 11. Estimate the molarity of the sample as follows, assuming an average RNA fragment size of ~250 bp. Your sample should be at a concentration of at least 0.1 nM. (a) nM ¼ 6  N ng/μL, where N is the RNA concentration estimated from the Qubit. 12. Aliquot 150 μL of >100 pM RNA baits for Subheading 3.7, step 1, and store the remaining baits (~690 μL) at 80  C (see Note 13). 3.7 Restriction Enzyme Digestion of the RNA Baits

1. Split the 150 μL of RNA baits (from Subheading 3.6, step 12) into ten aliquots of 14.4 μL in separate tubes. 2. Add 1.6 μL of NEB buffer 4 to each RNA bait aliquot. 3. Incubate the reactions in a thermocycler at 90  C for 5 min. 4. Incubate at room temperature for 10 min. 5. To each of the ten reactions, add 0.6 μL of nuclease-free water, 2.0 μL of BSA, 0.4 μL of NEB buffer 4, and 1.0 μL of EcoO109I restriction enzyme, for a total volume of 20 μL. 6. Incubate in a thermocycler at 37  C for 4 h, followed by 65  C for 20 min to heat inactivate the enzyme. 7. Combine all ten reactions and split into two tubes of 100 μL each. 8. Purify both 100 μL aliquots as described in Subheading 3.6, step 7, to give a total volume of 140 μL in two 70 μL aliquots. These aliquots should be combined into a single product before quantification and storage. 9. Check for successful digestion, and calculate bait library concentration on a Bioanalyzer using the Agilent RNA 6000 Nano Kit, following the manufacturer’s instructions (see Fig. 3). 10. Make aliquots of ~800 ng RNA baits, and store aliquots at 80  C. This step avoids unnecessary freeze/thaw cycles when performing capture reactions. Following this protocol, we often end up with ~20 μg of RNA baits, which is sufficient

118

Noah Snyder-Mackler et al.

fluorescent units

40

after RE digest before RE digest

30 20 10 0 25

200

500

1000

2000

4000

bases

Fig. 3 Example bioanalyzer traces of RNA baits before restriction enzyme (RE) digestion (blue) and after RE digestion (red). Note the decrease in mean bait size (represented by dashed vertical lines) of ~76 bases after RE digestion, which shows that the EcoT7 adapter has been cleaved off of each end of the RNA bait (see Fig. 1b, c for detailed schematic)

for at least 27 multiplexed hybridization reactions (750 ng per hybridization following [6]; see also Note 13).

4

Notes 1. Do not use diethyl pyrocarbonate (DEPC)-treated water because DEPC can inhibit PCR. 2. We have optimized this protocol for use with the Agencourt AMPure XP beads, but other bead-based purification methods (i.e., homemade SPRI beads) can be substituted. If using other beads, you will need to calibrate the bead to DNA/RNA ratio to use in each purification step. 3. We have optimized this protocol for use with the Ambion MEGAclear RNA purification kit, but other RNA purification methods could be substituted. 4. We have optimized this protocol for use with the Diagenode Bioruptor, but other instruments for physical DNA fragmentation (e.g., Covaris) could be substituted for DNA fragmentation. 5. We have optimized this protocol for use with the KAPA Library Preparation Kit, but other kits and reagents for preparing Illumina libraries could be substituted. 6. To avoid the unlikely scenario of RNA bait contamination in the post-enrichment libraries, we advise generating RNA baits

RNA Baits for Capture-Based Enrichment

119

from an individual that will not be targeted for later capturebased enrichment. 7. Be sure to allow the Bioruptor to cool to 4  C prior to shearing. When shearing multiple samples, we give the Bioruptor a 15 min break between every 45 min of shearing to ensure the machine doesn’t overheat, which can lead to inefficient shearing. 8. This is a safe stopping point. Store DNA at 20  C for up to a week. 9. Because the reactions in Subheading 3.3 (steps 3 through 5) use very small volumes, be sure to add reagents directly to the reaction volume, and make sure no liquid is stuck to the side of the tube when adding reagents and mixing samples. 10. The initial incubation of the sample and bead slurry should be for 15 min instead of 5 min (as in other bead purification procedures in this protocol) because the DNA is now singlestranded. 11. To ensure there is sufficient DNA for later steps, we do replicate PCRs of 4 μL DNA per reaction. 12. The DNA library should be at a concentration of 140 nM. To calculate the target volume for vacufuging each sample, use the following equation: (29 μL)  (Y nmol/L) ¼ (X μL)  (140 nM), where 29 μL is the current volume of the sample, Y nmol/L is the concentration of the sample obtained from the Qubit (see equation in Subheading 3.5, step 6), 140 nM is the desired molarity, and X is the target volume after concentrating the sample with the vacufuge. 13. The excess DNA library or RNA baits from these intermediate steps can be stored at 80  C for up to 1 year. If desired, saved product from this step can be used to generate additional RNA baits at a later date, starting with the product from Subheading 3.6, step 2 or 12, respectively. We have generated successful RNA baits starting with samples stored at both these intermediate steps.

Acknowledgments This work was supported by National Science Foundation grants DEB-1405308 (to J.T.) and SMA-1306134 (to J.T. and N.S.M.). We thank Jacob Gordon, Amanda Shaver, and Michael Yuan for key contributions to the protocol design and optimization and Arielle Fogel and Jen Tinsman for comments on an earlier draft of this chapter.

120

Noah Snyder-Mackler et al.

References 1. Green RE, Krause J, Briggs AW et al (2010) A draft sequence of the Neandertal genome. Science 328(5979):710–722 2. Sawyer S, Renaud G, Viola B et al (2015) Nuclear and mitochondrial DNA sequences from two Denisovan individuals. Proc Natl Acad Sci U S A 112(51):15696–15700 3. Reich D, Green RE, Kircher M et al (2010) Genetic history of an archaic hominin group from Denisova cave in Siberia. Nature 468 (7327):1053–1060 4. Schroeder H, Avila-Arcos MC, Malaspinas AS et al (2015) Genome-wide ancestry of 17thcentury enslaved Africans from the Caribbean. Proc Natl Acad Sci U S A 112(12):3669–3673 5. Miller W, Drautz DI, Ratan A et al (2008) Sequencing the nuclear genome of the extinct woolly mammoth. Nature 456 (7220):387–390 6. Snyder-Mackler N, Majoros WH, Yuan ML et al (2016) Efficient genome-wide sequencing and low coverage pedigree analysis from non-invasively collected samples. Genetics 203(2):699–714

7. Carpenter ML, Buenrostro JD, Valdiosera C et al (2013) Pulling out the 1%: whole-genome capture for the targeted enrichment of ancient DNA sequencing libraries. Am J Hum Genet 93:852–864 8. Gnirke A, Melnikov A, Maguire J et al (2009) Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nat Biotechnol 27:182–189 ´ vila-Arcos MC, Sandoval-Velasco M, Schroe9. A der H et al (2015) Comparative performance of two whole genome capture methodologies on ancient DNA Illumina libraries. Methods Ecol Evol 6(6):725–734 10. Perry GH, Marioni JC, Melsted P et al (2010) Genomic-scale capture and sequencing of endogenous DNA from feces. Mol Ecol 19:5332–5344 11. Vallender EJ (2011) Expanding whole exome resequencing into non-human primates. Genome Biol 12(9):R87 12. Enk JM, Devault AM, Kuch M et al (2014) Ancient whole genome enrichment using baits built from modern DNA. Mol Biol Evol 31:1292–1294

Chapter 13 Hybridization Capture of Ancient DNA Using RNA Baits Andre´ E. R. Soares Abstract The majority of DNA recovered from ancient remains is derived from organisms that colonize the remains post-mortem, such as soil microbes, or from contaminants, such as DNA from living humans. Additionally, some ancient DNA research projects aim to target specific genomic regions, such as mitochondrial genomes or variable single nucleotide polymorphisms (SNPs). To overcome the challenge of targeting specific fragments of DNA from within a complex DNA extract, methods have been developed to enrich ancient DNA extracts for target DNA relative to nontarget DNA. This chapter describes a method for target DNA enrichment that uses hybridization to biotinylated RNA baits to capture and amplify specific ancient DNA fragments from within the pool of extracted fragments. Key words Ancient DNA, DNA capture, RNA bait, Hybridization

1

Introduction It is common in ancient DNA research that the majority of DNA extracted from an ancient specimen is from contaminants or colonizing organisms such as bacteria and fungi. Such samples are said to have low proportions of endogenous DNA [1–3]. Sequencingbased analyses of DNA from such specimens can quickly become cost prohibitive. One way to increase the yield of endogenous DNA from such samples is through targeted sequencing. Prior to the high-throughput sequencing era, such targeting generally used the PCR to amplify specific genomic regions. However PCR requires longer DNA fragments than what are usually recovered from ancient samples. More recently, protocols have been developed that use bait molecules created from known sequenced organisms, such as from an extant close relative, to enrich genomic fragments of interest within a DNA extract and thereby increase the endogenous amount of target DNA in a genomic library. In these approaches a set of bait molecules or probes can be designed to target certain region of the genome [4] or the entire genome or mitochondrial genome [5].

Beth Shapiro et al. (eds.), Ancient DNA: Methods and Protocols, Methods in Molecular Biology, vol. 1963, https://doi.org/10.1007/978-1-4939-9176-1_13, © Springer Science+Business Media, LLC, part of Springer Nature 2019

121

122

Andre´ E. R. Soares

This chapter describes a method that hybridizes biotinylated RNA probes (“baits”) to heat-denatured DNA molecules [6, 7] in a genomic library. The DNA fragments, after hybridizing to the biotinylated probes, bond to streptavidin-coated metallic beads [8], which are washed in the presence of a strong magnet, resulting in an increase of target (bound to probes and magnetic beads) versus off-target (not bound and therefore possible to wash away) DNA in the sequencing library. The final cleaning step in the protocol is performed using solid-phase reversible immobilization (SPRI) beads, allowing for the recovery of short DNA fragments that are often lost during column-based cleanup.

2

Materials To decrease the risk of contamination, prepare all buffers using molecular biology grade reagents or purchase ready-made solutions. All buffers and solutions should be stored at 4  C, except for the Denhardt’s solution, which should be stored at 20  C, and the RNA baits, which should be stored at 80  C until use. It is recommended to aliquot RNA baits into small batches to avoid freeze/thaw cycles. This protocol requires an Illumina-compatible sequencing library prepared using extracted DNA of the target specimen.

2.1 Chemicals, Reagents, and Consumables

1. 0.5 M EDTA buffer. 2. 20 SSC stock solution: 3 M NaCl, 300 mM trisodium citrate dihydrate at pH 7.0. 3. 20 SSPE buffer: 0.02 M EDTA, 2.98 M NaCl in 0.2 M phosphate buffer at pH 7.4. 4. 50 Denhardt’s solution: 1% bovine serum albumin (BSA), 1% Ficoll 400, 1% polyvinylpyrrolidone (PVP). 5. 1 M Tris–HCl, pH 7.5. 6. 1% SDS. 7. TET buffer: 10 mM Tris–HCl, 0.005% Tween 20. 8. 1 μg/μL human Cot-1 DNA. 9. 1 μg/μL salmon sperm DNA. 10. MYcroarray Blocking Agent solution (see Note 1 for nonproprietary alternative). 11. RNase Block SUPERase-In. 12. Binding buffer: 1 M NaCl, 10 mM Tris–HCl, 1 mM EDTA pH 7.5. 13. Wash Buffer #1: 1 SSC, 0.1% SDS. 14. Wash Buffer #2: 0.1 SSC, 0.1% SDS.

Hybridization Capture of aDNA with RNA Baits

123

15. RNA probes: Biotinylated RNA baits (Microarray myBaits or custom-made baits; see Note 2). 16. Illumina sequencing primers. 17. Yeast tRNA (10 mg/mL). 18. Ultra clean, nuclease-free, ddH2O. 19. Magnetic bead solution: Dynabeads MyOne Streptavidin C1 (Invitrogen). 20. KAPA Taq HotStart Mix (Merck). 21. 0.2 mL individual, or strip, tubes with attached flat or dome snap shut caps (see Note 3). 22. 0.5 mL nuclease-free, sterile, centrifuge tubes. 23. 1.5 mL nuclease-free, sterile, centrifuge tubes. 2.2

Equipment

1. Thermocycler. 2. Magnetic stand (such as DynaMag-2). 3. Vortex machine. 4. Water bath or heat block set at 65  C (see Note 4). 5. Benchtop tube rotator. 6. Ice machine.

3

Methods

3.1 Workbench Preparation

During preparation of the solutions, keep every reagent on ice, other than the 1% SDS solution. Warm the 1% SDS solution to ambient temperature and up to 30  C until it completely dissolves. Keep this solution at room temperature while preparing the other reagents. The solution might look cloudy while not completely dissolved. Slowly thaw the RNA baits from 80  C, always keeping them on ice. Program your thermocycler with the following program: 1. 95  C: 5 min. 2. 65  C: 5 min (see Note 5). 3. 65  C: 36 h. We recommend testing the 0.2 mL tubes that will be used during the 36 h of hybridization beforehand. Add any liquid to the tubes, close tightly the lid and let it stay at 65  C in a thermocycler or heat block for at least 24 h, and check afterwards for liquid loss. If a noticeable volume evaporated during this test, try other brands of plastic tubes or tube strips.

124

Andre´ E. R. Soares

3.2 DNA Library and Blocking Agents Mix Preparation (Tube 1)

The human Cot-1, the salmon sperm DNA, and the blocking agent will help avoid non-specific DNA hybridization. In a clean, sterile, tube, prepare a mix containing: 1. 2.5 μL of 1 μg/μL human Cot-1 DNA. 2. 2.5 μL of 1 μg/μL salmon sperm DNA. 3. 0.6 μL of MYcroarray Blocking Agent Solution (or see Note 1). 4. 3.4 μL of your previously prepared Illumina DNA sequencing library. Mix by vortexing, and keep it on ice during the next steps. Spin down in a benchtop tube rotator to ensure all liquid is at the bottom of the tube. For multiple samples, follow steps 1–3 with the appropriate volume adjustment, and later mix in each sequencing library (see Note 6). The DNA sequencing library should be highly concentrated if possible, 1–2 μg.

3.3 Hybridization Solution Preparation (Tube 2)

In a clean, sterile tube, prepare the hybridization mix containing: 1. 0.8 μL of 0.5 M EDTA buffer. 2. 20 μL of 20 SSPE buffer. 3. 8 μL of 50 Denhardt’s solution. 4. 8 μL of 1% SDS. The final volume is designed to be sufficient for one sample hybridization; scale accordingly in case of multiple samples. Mix by vortexing, spin down the liquid, and keep Tube 2 at room temperature to avoid precipitation.

3.4 RNA Bait Preparation (Tube 3)

The proprietary biotinylated RNA probes from Arbor Biosciences are highly efficient and can be diluted up to 100, allowing for cost-effective large-scale experiments. This protocol follows a more conservative dilution of the capture probes to ensure its functionality. In a clean, sterile tube, prepare a mix containing: 1. 4.4 μL of yeast tRNA at 10 mg/mL. 2. 1.0 μL RNase Block SUPERase-In. 3. 0.5 μL biotinylated RNA probes. Pipette following the order above: first the yeast tRNA, followed by the RNase block, and lastly the RNA probes. Mix gently with a pipette, and do not vortex this mixture. Keep on ice until the next step.

3.5 Hybridization Preparation and Running

Make sure the thermocycler program has been set according to the instructions above. This is a three-step procedure in which timing is essential to avoid sample evaporation, so we recommend having your pipette tips ready and close to the thermocycler.

Hybridization Capture of aDNA with RNA Baits

125

1. Position Tube 1 at the thermocycler. 2. Prepare a second empty tube, Final Tube, at the thermocycler. 3. Start the thermocycler program, leaving Tube 1 at 95  C for 5 min. 4. Once the temperature drops to 65  C, bring Tube 2 to the thermocycler. 5. Count 3 min. 6. Bring Tube 3 to the thermocycler, allowing it to warm up to 65  C for at least 2 min. 7. Once the third step of the program starts, quickly pipette 7 μL of Tube 1 into Final Tube. 8. Quickly pipette 13 μL of Tube 2 into Final Tube. 9. Quickly pipette all 6 μL of Tube 3 into Final Tube. 10. Mix gently and swiftly by pipetting up and down, without ever removing the tube from the thermocycler. 11. Close tightly the Final Tube and let it hybridize for 36 h. 3.6 Captured DNA Recovery

3.6.1 Preparation of Streptavidin Beads

During this step the target DNA that hybridized to the biotinylated RNA probes will be recovered using magnetic beads covered in streptavidin. First bring Wash Buffer #1 to room temperature, allowing the SDS to completely dissolve. Warm the water bath, or heat block, to 65  C, and let the tube containing the Wash Buffer #2 preheat for at least 2 h. 1. Transfer 50 μL of Dynabeads MyOne Streptavidin C1 to a clean tube. 2. Pellet the beads using a magnetic stand for 5 min. 3. Discard the supernatant. 4. Add 200 μL of the binding buffer (1 M NaCl, 10 mM Tris–HCl, 1 mM EDTA pH 7.5) to the beads (see Note 7). 5. Vortex the mix and pellet the beads using a magnetic stand for 5 min. 6. Repeat steps 3–5 two more times. 7. Discard the supernatant. 8. Resuspend the beads in 200 μL of the binding buffer.

3.6.2 Recovery of the Captured DNA with the Streptavidin Beads

1. After the 36 h, hybridization ends; transfer the content of Final Tube to the tube from step 8 above with the binding buffer and beads. 2. Incubate for 1 h on a rotator, at room temperature. If a rotator is not available, agitate the tube every 5 min. 3. Pellet the beads on a magnetic stand for 5 min.

126

Andre´ E. R. Soares

4. Discard the supernatant. 5. Add 180 μL of Wash Buffer #1 to the beads; vortex. 6. Incubate for 15 min at room temperature, agitating the tube every 3 min. 7. Spin down the tube. 8. Pellet the beads for 5 min on a magnetic bed, and discard the supernatant after that. 9. Add 180 μL of Wash Buffer #2 to the beads; vortex. 10. Keep the beads with Wash Buffer #2 on the heat block or water bath at 65  C for 10 min, agitating the tube every 2 min. 11. Pellet the beads for 5 min on a magnetic bed, and discard the supernatant after that. 12. Repeat steps 9–11 two more times. 13. Carefully remove any remaining liquid from the beads. 14. Elute in 50 μL of TET (see Note 8). 3.7 Captured DNA Amplification

The amount of DNA recovered at the end of the previous step is very small. It is necessary to perform PCR amplification to achieve a minimal sequencing amount of DNA. Prepare the PCR mix. Per sample: 1. 7 μL of ddH2O. 2. 25 μL of KAPA Taq HotStart Mix (see Note 9). 3. 1.5 μL of Illumina IS5 primer. 4. 1.5 μL of Illumina IS6 primer. 5. 15 μL of the captured DNA library obtained above. Run the thermocycler with the following program: 1. 95  C for 30 s. 2. 95  C for 20 s. 3. 60  C for 10 s. 4. 72  C for 40 s. 5. Repeat steps 2–4, 30 times (see Note 10). 6. 72  C for 5 min. 7. Keep at 4  C. After the PCR amplification is done, pellet the beads using a magnetic bed, and remove the supernatant to a new clean tube. We recommend purifying the post-PCR product by using 2.0x AMPure XP SPRI beads. The high concentration of SPRI beads will ensure that the small aDNA fragments will be recovered, as purification by column generally removes smaller DNA fragments. Store the final product at 20  C.

Hybridization Capture of aDNA with RNA Baits

4

127

Notes 1. An alternative to the proprietary MYcroarray Blocking Agent was described by Carpenter et al. [7]: It is necessary to synthesize a T7 universal promoter (Thermo Fisher Scientific) plus a Multiplex-block-P5 (50 -AGATCGGAAGAGCGTCGTGTA GGGAAAGAGTGTAGATCTCGGTGGTCGCCGTA TCA TTCCTATAGTGAGTCGTATTAGTACT-30 ) or a Multiplexblock-P7 (50 -AGATCGGAAGAGCACACGTCTGAACTCCA GTCACNNN NNNATCTCGTATGCCGTCTTCTGCTT GCCTATAGTGAGTCGTATTAGTACT-30 ). 700 ng of these oligonucleotides are then subjected to in vitro transcription with T7 High-Yield RNA Synthesis kit (NEB) following the manufacturer’s protocol. The resulting solution is treated with 1 μL of TURBO DNase at 37  C for 15 min. The product should be cleaned using the RNeasy Mini Kit (QIAGEN) with 2.5 the amount of ethanol described in the manufacturer’s protocol and eluted in 30 μL of ddH2O þ 1.5 μL RNase Block SUPERase-In. The clean product should be stored at 80  C. 2. The RNA probes from Arbor Biosciences are made custom per order and will contain sequences based on your target specimen. 3. We recommend ordering extra tube lids. Sometimes the lids will not fit tightly enough to avoid evaporation. The 0.2 mL tubes used during the hybridization process at a thermocycler can be replaced with tube strips or plates when planning multiple sample captures. We recommend having up to three thermocyclers available in case of enough samples that justify the usage of 96-well plates, so each step of the hybridization preparation can be done simultaneously. 4. If a heat block is not available, a water bath can be used. The water bath takes longer to adjust the temperature, so it requires extra care and preparation. Larger tubes should be used with a water bath, so the whole solution is below water level, while the lid is high enough to not accidentally touch the water. 5. Despite staying at the same temperature, the extra step in the machine works as a timer to help coordinate each step. Step 2 could be further separated into two steps, so instead of 5 min, there would be two steps, one with 3 min and one with 2 min. 6. For multiple libraries it is easier to prepare a tube strip with each library at one tube, then prepare the mix with the items from steps 1–3, and then add them to each tube. The same procedure can also be done in case of a plate.

128

Andre´ E. R. Soares

7. In case of multiple samples to be prepared on a 96-well plate, reduce the binding buffer volume to 150 μL so to fit the limited space of a plate well. 8. It is possible to have the library as is, with no PCR amplification. To achieve that, heat the beads after step 14 at 95  C for 10 min. After that, pellet the beads with the magnetic bed, and remove the supernatant to a new tube. Store at 20  C. 9. The KAPA Taq HotStart Mix can be replaced with the KOD Hot Start DNA Polymerase (Merck). In that case it is recommended to remove the DNA from the beads as described in Note 8. 10. The amount of DNA captured is generally low in the case of ancient DNA, so we recommend a 30-cycle PCR instead of a regular 14-cycle PCR, although the most appropriate number of cycles can be ascertained using qPCR. Please note that overamplification will decrease the sequencing library complexity. References 1. Green RE, Krause J, Briggs AW et al (2010) A draft sequence of the Neandertal genome. Science 328:710–722 2. Reich D, Green RE, Kircher M et al (2010) Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature 468:1053–1060 3. Skoglund P, Malmstro¨m H, Raghavan M et al (2012) Origins and genetic legacy of Neolithic farmers and hunter-gatherers in Europe. Science 336:466–469 4. Burbano HA, Hodges E, Green RE et al (2010) Targeted investigation of the Neandertal genome by array-based sequence capture. Science 328:723–725 5. Heintzman PD, Froese D, Ives JW et al (2016) Bison phylogeography constrains dispersal and

viability of the Ice Free Corridor in western Canada. Proc Natl Acad Sci U S A 113:8057–8063 6. Gnirke A, Melnikov A, Maguire J et al (2009) Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nat Biotechnol 27:182–189 7. Carpenter ML, Buenrostro JD, Valdiosera C et al (2013) Pulling out the 1%: whole-genome capture for the targeted enrichment of ancient DNA sequencing libraries. Am J Hum Genet 93:852–864 8. Sano T, Vajda S, Cantor CR (1998) Genetic engineering of streptavidin, a versatile affinity tag. J Chromatogr B Biomed Sci Appl 715:85–91

Chapter 14 Application of Solid-State Capture for the Retrieval of Small-to-Medium Sized Target Loci from Ancient DNA Johanna L. A. Paijmans, Gloria Gonza´lez Fortes, and Daniel W. Fo¨rster Abstract Genetic studies that include ancient samples are often hampered by the low amount of endogenous DNA that ancient samples often contain, relative to co-extracted “contaminant” DNA from other organisms. One approach to mitigate this challenge is to perform hybridization-based capture of target genomic regions using DNA or RNA baits. Such baits are designed to have high sequence similarity to the target genomic regions and can reduce the off-target fraction in DNA sequencing libraries. Here, we present a protocol to use Agilent SureSelect microarrays to enrich ancient DNA libraries for small-to-medium-sized target loci, such as mitochondrial genomes, from ancient DNA extracts. The protocol that we present builds on previously published work by introducing improvements that improve recovery of short DNA fragments while minimizing the cost and duration of the experiment. Key words Ancient DNA, Next-generation sequencing, Target enrichment, Microarray capture, Hybridization capture

1

Introduction The field of ancient DNA has leaped into the palaeogenomic era, in which ancient DNA researchers are now using genome-scale data retrieved from archival or ancient samples. However, such research is contingent on the availability of high-quality samples, where quality is determined by the amount and complexity of endogenous DNA—the genetic material from the organism of interest—and little contaminant DNA. Unfortunately, such samples are rare and tend to originate from locations where environmental conditions are favorable for post-mortem sample preservation, such as permafrost or caves. As a result, effort has gone into developing approaches to recover greater amounts of DNA from poorly preserved samples and thereby extend the range of samples that can be used for palaeogenomic studies. For example, materials and results have been compared to determine the sampling strategy that is most likely to yield a specimen with the highest endogenous

Beth Shapiro et al. (eds.), Ancient DNA: Methods and Protocols, Methods in Molecular Biology, vol. 1963, https://doi.org/10.1007/978-1-4939-9176-1_14, © Springer Science+Business Media, LLC, part of Springer Nature 2019

129

130

Johanna L. A. Paijmans et al.

content (e.g., [1–3]). Furthermore, several types of pretreatments have been developed that are aimed at reducing the amount of contaminant DNA present in a sample prior to extraction [1, 4, 5]. While targeted sampling and pretreatment strategies can reduce the amount of contaminant DNA, however, they cannot completely remove it [4, 5]. Finally, in some cases, large amounts of data have been retrieved from relatively low-quality samples by “brute force sequencing” (e.g., [6, 7]), in which multiple extracts and multiple libraries from each extract are sequenced to exhaustion. This approach can be costly, however, and can require significant amounts of sample to be destroyed in the process (e.g., >1500 mg; [6]). For many research questions, it is sufficient to recover data for a small number of genetic loci from a large number of individuals, rather than the other way around [8]. In these cases, hybridization capture applications can be used to target and enrich particular genetic region(s) of interest by reducing the fraction of off-target DNA prior to sequencing. Hybridization capture was applied to reconstruct the first Neanderthal mitochondrial genome in 2009 [9] and has since found wide application in the enrichment of ancient DNA from subfossil samples (e.g., [10]). Several approaches to hybridization capture have been developed, including both in-solution and array-based approaches. The choice of capture technology depends on factors such as the size or number of target regions. Array-based, or microarray, capture [11] is generally used to enrich small-to-medium-sized targets, such as mitogenomes (e.g., [12]), a selection of single-nucleotide polymorphisms (SNPs; e.g., [13]), a small number of genes or exons (e.g., [14]), or whole genomes of organisms with small genome sizes, such as pathogens (e.g., [15, 16]). Microarrays are glass slides with single-stranded synthetic baits printed on the surface (called “spots”). On-target library molecules are immobilized during hybridization (generally at 65  C, but [12]. After hybridization, off-target (unhybridized) DNA is washed away. The hybridized, on-target DNA can then be eluted, amplified, and sequenced. The most commonly used slides, SureSelect Arrays from Agilent, offer slides with up to one million bait spots, each up to 60 bp long, which allows for a target size of about six million bases assuming a 10 bp spacing strategy. When a larger target size is desired, several arrays can be used in parallel (e.g., [15]). Array-based enrichment of much larger targets, for example, a mammalian genome, is not feasible with the currently available technology. Because of the relatively high cost of a microarray slide, this type of capture generally involves pooling multiple libraries on a single array, thus reducing the per-sample cost of capture. The maximum number of samples that can be pooled on an array depends on the size of the target region as well as the quality of

Solid-State Capture of Ancient DNA

131

the samples. For a small target such as the mitochondrial genome, more than 50 ancient samples can be enriched simultaneously on a single array [17]. Such a strategy allows high sample throughput with little hands-on time in the lab. When pooling samples of unknown preservation quality, as is generally the case for ancient DNA, it is advisable to adjust the pooling strategy [12]. Higherquality samples can outcompete lower-quality samples on the array, leading to an imbalance in sequencing amount per [17, 18]. This bias can be mediated by pooling samples according to the expected quality [12], or, ideally, endogenous content should be established beforehand through qPCR [19] or low-level shotgun sequencing. To further leverage the output from a single capture array, researchers have developed protocols that cleave the singlestranded DNA baits off the array slide, making it possible to use the same bait for in-solution capture (e.g., [20–22]). If the manufacturer consents to this approach, baits can be designed to contain a priming site that allows post-cleaving amplification of the bait. Such an approach can potentially generate a sufficient quantity of baits to capture more samples (individually, in-solution) than can be multiplexed on a single array. These advances have allowed array capture to be applied in an increasing diversity of ancient DNA studies, including in the field of ancient pathogens [15, 16, 23]. In this chapter, we outline a protocol for the enrichment of ancient DNA samples using microarrays. The protocol is based on that described by [11], with several optimizations taken from other protocols [24–26]. We also briefly describe how to design array baits for small targets (i.e., mitogenomes) and discuss some basic troubleshooting.

2

Materials 1. Oligonucleotides (oligos), adapted from [24, 25, 27] (Table 1).

2.1

Reagents

1. Oligo hybridization buffer (10): 500 mM NaCl, 10 mM Tris–HCl (pH 8.0), 1 mM EDTA (pH 8.0). 2. Taq (Herculase). 3. dNTPs (e.g., Invitrogen). 4. Agilent aCGH/Chip-on-Chip Wash Buffer 1 (Agilent, cat no. 5188-5221). 5. Agilent aCGH/Chip-on-Chip Wash Buffer 2 (Agilent, cat no. 5188-5222). 6. Oligo aCGH/ChIP-on-Chip Hybridization Kit (Agilent, cat no. 5188-5220) contains 2 Hi-RPM Hybridization Buffer and 10 Oligo aCGH/ChIP-on-Chip Blocking Agent. The

132

Johanna L. A. Paijmans et al.

Table 1 Oligo sequences Name

Sequence

Comments

Amplification primers IS5

AAT GAT ACG GCG ACC ACC GA

IS6

CAA GCA GAA GAC GGC ATA CGA

IS7

ACA CTC TTT CCC TAC ACG AC

IS8

GTG ACT GGA GTT CAG ACG TGT

CL72 (custom R1 sequencing primer for single-stranded libraries)

ACA CTC TTT CCC TAC ACG ACG CTC TTC C

(See Note 1)

Gesaffelstein (custom I2 sequencing primer for single-stranded libraries)

GGA AGA GCG TCG TGT AGG GAA AGA GTG T

(See Note 1)

BO_P5_ext_F

ATC TCG TAT GCC GTC TTC TGC TTG-Pho

“Pho” indicates a 30 -end phosphate (see Notes 2–5)

BO_P5_ext_R

CAA GCA GAA GAC GGC ATA CGA GAT-Pho

30 -end phosphate (see Notes 2–5)

BO_P5-SS_trunc_F

ACA CTC TTT CCC TAC ACG ACG CTC TTC C-Pho

30 -end phosphate (see Notes 2–5)

BO_P5-SS_trunc_R

G GAA GAG CGT CGT GTA GGG AAA 30 -end phosphate GAG TG-Pho (see Notes 2–5)

BO_P7_trunc_F

AGA TCG GAA GAG CAC ACG TCT GAA CTC CAG TCA C-Pho

30 -end phosphate (see Notes 2–5)

BO_P7_trunc_R

GTG ACT GGA GTT CAG ACG TGT GCT CTT CCG ATC T-Pho

30 -end phosphate (see Notes 2–5)

BO_P7_ext_F

ATC TCG TAT GCC GTC TTC TGC TTG-Pho

30 -end phosphate (see Notes 2–5)

BO_P7_ext_R

CAA GCA GAA GAC GGC ATA CGA GAT-Pho

30 -end phosphate (see Notes 2–5)

Blocking oligos

Blocking Agent needs to be resuspended by adding 1350 μL nuclease-free water. Incubate at room temperature for 60 min and mix gently. 7. Qiagen MinElute PCR purification columns (Qiagen, cat no. 28004) contains Qiagen MinElute purification spin columns, PB binding buffer, PE washing buffer and EB elution buffer, as well as loading dye and pH indicator (not used).

Solid-State Capture of Ancient DNA

133

Before use, add ethanol (96–100%) to PE buffer (see bottle label for volume). 8. SYBR Green PCR Master Mix (Applied Biosystems, Art. No. 4309155). 9. Tapestation D1000 ScreenTape (or other DNA quantification kit compatible with available equipment). 2.2

Equipment

1. Microcentrifuge in ancient and modern lab (e.g., Labnet Spectrafuge 24D). 2. Quantitative PCR (e.g., PikoReal™ Real-Time PCR System). 3. Microarray hybridization oven, temperature range up to 95  C (steps 9 and 18, Subheading 3.2; e.g., Affymetrix Model 777 Hybridization Oven) (see Note 8). 4. Agilent Tapestation 2200 (or comparable, e.g., Agilent BioAnalyzer). 5. SureHyb DNA Microarray Chamber (steps 4–19, Subheading 3.2) (Agilent). 6. Microarray gasket slides (Agilent). 7. Slide spinner or centrifuge fitted with microplate adapters. 8. Slide rack. 9. Three slide staining dishes. 10. One large dish (large enough to envelop one of the slide staining dishes with sufficient room to spare). 11. Heating blocks (95  C and 37  C). 12. Magnetic hot plate and stirrers. 13. 30 gauge needles. 14. 1 mL syringes.

3

Methods

3.1 Array Design (See Note 7)

1. Identify the target sequences for the desired species (or close relative), and generate 60 bp baits with the desired tiling. A repeat detection and masking tool such as RepeatMasker (http://www.repeatmasker.org/) can be used to mask repeats. The resulting bait-set can then be used to custom design a microarray (e.g., Agilent SureSelect DNA capture array).

3.2 Hybridization Capture

1. Pool the libraries in equimolar amounts (see Note 9), and bring the volume up to 168 μL with H2O. 2. Prepare hybridization mix as follows (see Note 8): (a) 168 μL of the pooled library. (b) 5 μL of each blocking oligo (Table 1) (200 μM).

134

Johanna L. A. Paijmans et al.

(c) 52 μL of Agilent Blocking Agent (10). (d) 260 μL of Agilent Hybridization Buffer (2). 3. Incubate the mixture at 95  C for 3 min, followed by 37  C for 30 min (see Note 10). 4. Disassemble the hybridization chamber, and place a clean, dry gasket in the chamber. 5. Load 490 μL of the mixture on the gasket (see Note 11). 6. Slowly load the array down on the gasket: numbered barcode facing up, Agilent-labeled barcode facing down (Reminder: Agilent side ¼ Active side). 7. Reassemble the chamber; screw tight to secure. 8. Make sure there are no stationary bubbles by rotating the chamber. If there are stationary bubbles, tap the chamber on a solid surface to dislodge them. 9. Hybridize for 60–65 h at 65  C under rotation at 12 rpm (see Note 12). 10. After 48 h (i.e., the day before the hybridization is complete), preheat the materials necessary for the washing step with washing buffer 2 (see step 1). Fill one bottle with H2O and one bottle with Agilent washing buffer 2, and incubate overnight at 37  C together with the glassware. 11. Disassemble the hybridization chamber, and move the whole gasket-array sandwich into a dish with washing buffer 1. Pry the slides apart (see Note 13), and move the array to the slide holder and washing dish 1. Make sure the slide is not exposed to air for extended periods of time. 12. With gentle stirring, incubate at room temperature for 10 min. 13. Prepare the washing dish 2: place the small dish inside the big dish. Fill the big dish with water and the small dish with Agilent washing buffer 2. This way, the washing buffer will maintain a constant temperature of 37  C. 14. Move the slide holder with the array into the dish containing washing buffer 2. 15. With gentle stirring, incubate at 37  C for 5 min. 16. Spin the slide for 1 min at 600 rpm to dry. 17. Load 490 μL nuclease-free water on a new gasket, and reassemble the hybridization chamber. Again, make sure there are no stationary bubbles. 18. Incubate for 10 min at 95  C. If the temperature drops after opening the oven, start timer when oven reaches 90  C (see Note 6).

Solid-State Capture of Ancient DNA

135

19. Use a needle and syringe to extract the eluate: first, slightly loosen the screw of the hybridization chamber. Then push the needle through the rubber seal on the unlabeled side of the slides. If the needle does not go through, loosen the screw of the chamber a bit more and try again. Elute the entire volume from the array before disassembling the chamber. 20. After capture, the optimal cycle numbers to amplify the enriched product can be established using qPCR to avoid insufficient or over-amplification. Set up the amplification master mix as follows, for each reaction (see Note 8): (a) 3.6 μL of H2O. (b) 0.2 μL of primer IS5 (10 μM). (c) 0.2 μL of primer IS6 (10 μM). (d) 5 μL of SYBR Green Master Mix. 21. Combine 1 μL from the eluate with 9 μL master mix, in three replicate reactions. Mix gently and amplify according to the following temperature profile: initial denaturation at 94  C for 10 min, followed by 40 cycles of 94  C for 30 s, 60  C for 45 s, and 72  C for 45 s. 22. Based on the resulting amplification curve, pick the appropriate cycle numbers. 23. When the optimal cycle numbers are established, amplify the enriched library pool in 24 parallel reactions. Set up the amplification master mix as follows, for each reaction (see Note 8): (a) 17 μL of H2O. (b) 12 μL of Herculase Buffer (5). (c) 2.4 μL of primer IS5 (10 μM). (d) 2.4 μL of primer IS6 (10 μM). (e) 0.6 μL of dNTPs (25 mM each). (f) 0.6 μL Herculase Fusion (see Note 14). 24. Add 35 μL of master mix to each 25 μL sample. Mix gently and amplify according to the following temperature profile: initial denaturation at 94  C for 10 min, followed by the appropriate amount of cycles (established by qPCR, see Note 15) of 94  C for 30 s, 60  C for 45 s, and 72  C for 45 s, followed by a final extension at 72  C for 3 min. 25. Purify the pooled parallel reactions using MinElute PCR purification columns, as described in Subheading 3.2, step 5. Elute every column twice with 10 μL EB buffer, each with 5 min incubation time. 26. Verify the success of the amplification on a Tapestation D1000 ScreenTape or Agilent Bioanalyzer DNA 1000 analysis kit.

136

Johanna L. A. Paijmans et al.

27. Repeat the capture with the amplified product (starting from Subheading 3.2).

4

Notes 1. When the single-stranded library preparation by [25] is used for library preparation, custom sequencing primers are required for the R1 and Index2 reads on the NextSeq 500/550 [27]. 2. The P5 blocking oligos presented here are compatible with the single-stranded library preparation by [25]. If a doublestranded library preparation protocol was used, these two can be replaced by BO_P5-DS_trunc_F: ACA CTC TTT CCC TAC ACG ACG CTC TTC CGA TCT-Pho and BO_P5DS_trunc_R: AGA TCG GAA GAG CGT CGT GTA GGG AAA GAG TG-Pho. 3. The P7 blocking oligos presented here are also compatible with the indexing method described in [24], as it has two independent blocking oligos for the P7 adapter. 4. The P5 blocking oligos presented here are also compatible with the indexing method described in [24], as it has two independent blocking oligos for the P5 adapter. 5. The indices are not actually blocked. However, because the indices are only 8 bp long, any spurious hybridization should be disrupted during the heated washing step. 6. For Subheading 3.2, step 18, if no oven is available that rotates and goes up to 95  C, an alternative is to put the chamber on one side, incubate for 5 min, rotate to the other side, and incubate for another 5 min. 7. The array design described here was designed to enrich for whole mitogenomes. However, the procedure should be suitable for other targets as well. 8. We recommend counting one more reaction than the number of samples, due to volume loss during pipetting. 9. This protocol is designed for a maximum of 20 μg DNA, although successful enrichment is possible with much lower quantities too. 10. Masking tape should be used to close the lid of the tube during the 95  C incubation; parafilm may not be sufficient to keep the lid closed. 11. Distribute the mixture equally over the surface while keeping away from the rubber seal around the edges.

Solid-State Capture of Ancient DNA

137

12. If the slides are difficult to pry apart, try piercing the rubber seal with a needle to release the pressure. 13. There are studies suggesting that changing the hybridization temperature can improve the capture efficiency and yield higher on-target ratios in the captured product. A touchdown protocol has been shown to be effective for modern samples [12, 28], and lower hybridization temperature (e.g., 55  C; [29]) could be beneficial for samples with low endogenous content. Careful consideration of the hybridization temperature would be advisable for challenging samples. 14. Herculase Fusion has been found to have few biases for library amplification [30]. 15. A duplication of molecules after each PCR cycle serves as a basic assumption for calculating the number of cycles. Ideally, the cycle number should be selected in the exponential phase of the PCR. As the volume for the indexing PCR is six times larger than that of the qPCR, three cycles should be added. References 1. de Damgaard PB, Margaryan A, Schroeder H, Orlando L, Willerslev E, Allentoft ME (2015) Improving access to endogenous DNA in ancient bones and teeth. bioRxiv 014985. https://doi.org/10.1101/014985 2. Pinhasi R, Fernandes D, Sirak K, Novak M, Connell S, Alpaslan-Roodenberg S, Gerritsen F, Moiseyev V, Gromov A, Raczky P, Anders A, Pietrusewsky M, Rollefson G, Jovanovic M, Trinhhoang H, Bar-Oz G, Oxenham M, Matsumura H, Hofreiter M (2015) Optimal ancient DNA yields from the inner ear part of the human petrous bone. PLoS One 10:e0129102. https://doi.org/10.1371/journal.pone. 0129102 3. Alberti F, Gonzalez J, Paijmans JLA, Basler N, Preick M, Henneberger K, Trinks A, Rabeder G, Conard NJ, Mu¨nzel SC, Joger U, Fritsch G, Hildebrandt T, Hofreiter M, Barlow A (2018) Optimized DNA sampling of ancient bones using Computed Tomography scans. Mol Ecol Res. https://doi.org/10.1111/ 1755-0998.12911 4. Korlevic´ P, Gerber T, Gansauge M-T, Hajdinjak M, Nagel S, Aximu-Petri A, Meyer M (2015) Reducing microbial and human contamination in DNA extractions from ancient bones and teeth. BioTechniques 59. https:// doi.org/10.2144/000114320 5. Basler N, Xenikoudakis G, Westbury MV, Song L, Sheng G, Barlow A (2017) Reduction

of the contaminant fraction of DNA obtained from an ancient giant panda bone. BMC Res Notes 10. https://doi.org/10.1186/s13104017-3061-3 6. Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, Patterson N, Li H, Zhai W, Fritz MH-Y, Hansen NF, Durand EY, Malaspinas A-S, Jensen JD, MarquesBonet T, Alkan C, Pru¨fer K, Meyer M, Burbano HA, Good JM, Schultz R, Aximu-Petri A, Butthof A, Ho¨ber B, Ho¨ffner B, Siegemund M, Weihmann A, Nusbaum C, Lander ES, Russ C, Novod N, Affourtit J, Egholm M, Verna C, Rudan P, Brajkovic D, Kucan Z, Gusˇic I, Doronichev VB, Golovanova LV, Lalueza-Fox C, De La Rasilla M, Fortea J, Rosas A, Schmitz RW, Johnson PLF, Eichler EE, Falush D, Birney E, Mullikin JC, Slatkin M, Nielsen R, Kelso J, Lachmann M, Reich D, P€a€abo S (2010) A draft sequence of the Neandertal Genome. Science 328:710–722. https://doi.org/10.1126/sci ence.1188021 7. Meyer M, Arsuaga J-L, de Filippo C, Nagel S, Aximu-Petri A, Nickel B, Martı´nez I, Gracia A, de Castro JM, Carbonell E, Viola B, Kelso J, Pru¨fer K, P€a€abo S (2016) Nuclear DNA sequences from the Middle Pleistocene Sima de los Huesos hominins. Nature 531:504–507. https://doi.org/10.1038/ nature17405 8. Jones MR, Good JM (2016) Targeted capture in evolutionary and ecological genomics. Mol

138

Johanna L. A. Paijmans et al.

Ecol 25:185–202. https://doi.org/10.1111/ mec.13304 9. Briggs AW, Good JM, Green RE, Krause J, Maricic T, Stenzel U, Lalueza-Fox C, Rudan P, Brajkovic´ D, Kuc´an Z, Gusˇic´ I, Schmitz R, Doronichev VB, Golovanova LV, de la Rasilla M, Fortea J, Rosas A, P€a€abo S (2009) Targeted retrieval and analysis of five Neandertal mtDNA Genomes. Science 325:318–321. https://doi.org/10.1126/sci ence.1174462 10. Rizzi E, Lari M, Gigli E, Bellis GD, Caramelli D (2012) Ancient DNA studies: new perspectives on old samples. Genet Sel Evol 44:1–19. https://doi.org/10.1186/1297-9686-44-21 11. Hodges E, Rooks M, Xuan Z, Bhattacharjee A, Benjamin Gordon D, Brizuela L, Richard McCombie W, Hannon GJ (2009) Hybrid selection of discrete genomic intervals on custom-designed microarrays for massively parallel sequencing. Nat Protoc 4:960–974. https://doi.org/10.1038/nprot.2009.68 12. Paijmans JLA, Fickel J, Courtiol A, Hofreiter M, Fo¨rster DW (2016) Impact of enrichment conditions on cross-species capture of fresh and degraded DNA. Mol Ecol Resour 16:42–55. https://doi.org/10.1111/17550998.12420 13. King TE, Fortes GG, Balaresque P, Thomas MG, Balding D, Delser PM, Neumann R, Parson W, Knapp M, Walsh S, Tonasso L, Holt J, Kayser M, Appleby J, Forster P, Ekserdjian D, Hofreiter M, Schu¨rer K (2014) Identification of the remains of King Richard III. Nat Commun 5:5631. https://doi.org/ 10.1038/ncomms6631 14. Springer MS, Signore AV, Paijmans JLA, Ve´lezJuarbe J, Domning DP, Bauer CE, He K, Crerar L, Campos PF, Murphy WJ, Meredith RW, Gatesy J, Willerslev E, MacPhee RDE, Hofreiter M, Campbell KL (2015) Interordinal gene capture, the phylogenetic position of Steller’s sea cow based on molecular and morphological data, and the macroevolutionary history of Sirenia. Mol Phylogenet Evol 91:178–193. https://doi.org/10.1016/j.ympev.2015.05. 022 15. Bos KI, Schuenemann VJ, Golding GB, Burbano HA, Waglechner N, Coombes BK, McPhee JB, DeWitte SN, Meyer M, Schmedes S, Wood J, Earn DJD, Herring DA, Bauer P, Poinar HN, Krause J (2011) A draft genome of Yersinia pestis from victims of the Black Death. Nature 478:506–510. https:// doi.org/10.1038/nature10549 16. Bos KI, J€ager G, Schuenemann VJ, Va˚gene A˚J, Spyrou MA, Herbig A, Nieselt K, Krause J (2015) Parallel detection of ancient pathogens

via array-based DNA capture. Philos Trans R Soc Lond Ser B Biol Sci 370. https://doi.org/ 10.1098/rstb.2013.0375 17. Fortes GG, Grandal-d’Anglade A, Kolbe B, Fernandes D, Meleg IN, Garcı´a-Va´zquez A, Pinto-Llona AC, Constantin S, de Torres TJ, Ortiz JE, Frischauf C, Rabeder G, Hofreiter M, Barlow A (2016) Ancient DNA reveals differences in behaviour and sociality between brown bears and extinct cave bears. Mol Ecol 25:4907–4918. https://doi.org/10.1111/ mec.13800 18. Hawkins MTR, Hofman CA, Callicrate T, McDonough MM, Tsuchiya MTN, Gutie´rrez EE, Helgen KM, Maldonado JE (2015) In-solution hybridization for mammalian mitogenome enrichment: pros, cons and challenges associated with multiplexing degraded DNA. Mol Ecol Res. https://doi.org/10. 1111/1755-0998.12448 19. Enk J, Rouillard J-M, Poinar H (2013) Quantitative PCR as a predictor of aligned ancient DNA read counts following targeted enrichment. BioTechniques 55:300–309 20. Fu Q, Meyer M, Gao X, Stenzel U, Burbano HA, Kelso J, Paabo S (2013) DNA analysis of an early modern human from Tianyuan Cave, China. Proc Natl Acad Sci 110:2223–2227. https://doi.org/10.1073/pnas.1221359110 21. Haak W, Lazaridis I, Patterson N, Rohland N, Mallick S, Llamas B, Brandt G, Nordenfelt S, Harney E, Stewardson K, Fu Q, Mittnik A, Ba´nffy E, Economou C, Francken M, Friederich S, Pena RG, Hallgren F, Khartanovich V, Khokhlov A, Kunst M, Kuznetsov P, Meller H, Mochalov O, Moiseyev V, Nicklisch N, Pichler SL, Risch R, Rojo Guerra MA, Roth C, Sze´cse´nyi-Nagy A, Wahl J, Meyer M, Krause J, Brown D, Anthony D, Cooper A, Alt KW, Reich D (2015) Massive migration from the steppe was a source for Indo-European languages in Europe. Nature 522:207–211. https://doi. org/10.1038/nature14317 22. Castellano S, Parra G, Sanchez-Quinto FA, Racimo F, Kuhlwilm M, Kircher M, Sawyer S, Fu Q, Heinze A, Nickel B, Dabney J, Siebauer M, White L, Burbano HA, Renaud G, Stenzel U, Lalueza-Fox C, de la Rasilla M, Rosas A, Rudan P, Brajkovi D, eljko K, Gusˇic I, Shunkov MV, Derevianko AP, Viola B, Meyer M, Kelso J, Andres AM, Paabo S (2014) Patterns of coding variation in the complete exomes of three Neandertals. Proc Natl Acad Sci 111:6666–6671. https:// doi.org/10.1073/pnas.1405138111 23. Schuenemann VJ, Bos K, DeWitte S, Schmedes S, Jamieson J, Mittnik A, Forrest S,

Solid-State Capture of Ancient DNA Coombes BK, Wood JW, Earn DJD, White W, Krause J, Poinar HN (2011) Targeted enrichment of ancient pathogens yielding the pPCP1 plasmid of Yersinia pestis from victims of the Black Death. Proc Natl Acad Sci 108: E746–E752. https://doi.org/10.1073/pnas. 1105107108 24. Meyer M, Kircher M (2010) Illumina sequencing library preparation for highly multiplexed target capture and sequencing. Cold Spring Harb Protoc 2010. https://doi.org/10. 1101/pdb.prot5448 25. Gansauge M-T, Meyer M (2013) Singlestranded DNA library preparation for the sequencing of ancient or damaged DNA. Nat Protoc 8:737–748. https://doi.org/10.1038/ nprot.2013.038 26. Fortes GG, Paijmans JLA (2016) Analysis of whole mitogenomes from ancient samples. In: Kroneis T (ed) Whole genome amplification. Humana Press, New York 27. Paijmans JLA, Baleka S, Henneberger K, Taron UH, Trinks A, Westbury MV, Barlow A (2017)

139

Sequencing single-stranded libraries on the Illumina NextSeq 500 platform. arXiv:171111004 [q-bio] 28. Li C, Hofreiter M, Straube N, Corrigan S, Naylor GJ (2013) Capturing protein-coding genes across highly divergent species. BioTechniques 54:321–326 29. Cruz-Da´valos DI, Llamas B, Gaunitz C, Fages A, Gamba C, Soubrier J, Librado P, Seguin-Orlando A, Pruvost M, Alfarhan AH, Alquraishi SA, Al-Rasheid KAS, Scheu A, Beneke N, Ludwig A, Cooper A, Willerslev E, Orlando L (2017) Experimental conditions improving in-solution target enrichment for ancient DNA. Mol Ecol Resour 17:508–522. https://doi.org/10.1111/1755-0998.12595 30. Dabney J, Meyer M (2012) Length and GC-biases during sequencing library amplification: a comparison of various polymerasebuffer systems with ancient and modern DNA sequencing libraries. BioTechniques 52:87–94. https://doi.org/10.2144/000113809

Chapter 15 Targeted PCR Amplification and Multiplex Sequencing of Ancient DNA for SNP Analysis Saskia Wutke and Arne Ludwig Abstract The analysis of single-nucleotide polymorphisms (SNPs) has proven to be advantageous for addressing variation within samples of highly degraded or low-quality DNA samples. This is because only short fragments need to be amplified to analyze SNPs, and this can be achieved by multiplex PCR. Here, we present a sensitive method for the targeted sequencing of SNP loci that requires only small amounts of template DNA. The approach combines multiplex amplification of very short fragments covering SNP positions followed by sample barcoding and next-generation sequencing. This method allows generation of data from large sample sets of poorly preserved specimens, such as fossil remains, forensic samples, and museum specimens. The approach is cost-effective, rapid, and applicable to forensics, population genetics, and phylogenetic research questions. Key words Ancient DNA, Single-nucleotide polymorphism, Multiplex PCR, Next-generation sequencing, Library preparation

1

Introduction Ancient DNA (aDNA) approaches are used to investigate genetic and phenotypic variation in historical samples. The incorporation of historic and ancient samples into evolutionary research has been accelerated by development of next-generation sequencing technologies. However, DNA degradation and low endogenous DNA content continue to pose challenges in aDNA research [1, 2]. Therefore, DNA extracts often need to be enriched for the genomic regions of interest, e.g., by targeted capture or PCR amplification. A convenient way to overcome the problem of DNA degradation is to analyze single-nucleotide polymorphisms (SNPs), as SNP typing requires recovering only short fragments that span the relevant positions [3]. SNP analysis is an established and powerful tool for studying genetic variation and can be applied to address a broad range of questions, such as about population structure or taxonomic classification [4–6]. SNPs can also be used to infer

Beth Shapiro et al. (eds.), Ancient DNA: Methods and Protocols, Methods in Molecular Biology, vol. 1963, https://doi.org/10.1007/978-1-4939-9176-1_15, © Springer Science+Business Media, LLC, part of Springer Nature 2019

141

142

Saskia Wutke and Arne Ludwig

phenotypic traits from highly degraded samples and thus can provide insights into the impact of natural or artificial selection on phenotypes [7–10]. Multiplex PCR offers a way to amplify several target regions in one reaction and thereby saves both time and valuable template DNA. Although hybridization capture methods have advantages compared to PCR enrichment [11, 12], a major benefit of SNP typing via multiplex PCR is the fast and inexpensive simultaneous analysis of known SNP loci in hundreds of samples. In particular, combining multiplex amplification with direct high-throughput sequencing reduces both cost and time considerably [13]. We describe a method to genotype many different SNPs in hundreds of ancient samples [14–17]. First we use a modified multiplex PCR approach where all target-specific primers are tagged with universal tags (CS1 forward tag, CS2 reverse tag; see Fluidigm Access Array system, Fluidigm Corporation). The diluted PCR products from this first-step multiplex amplification then serve as a template for the subsequent indexing PCR. In this step, each multiplex PCR product is coupled with a sample-specific barcode and the Illumina-specific adapter sequences using a maximum of 384 single-direction barcode primers (Fluidigm Corporation). Finally, all individually barcoded samples are purified using Agencourt Ampure (SPRI) beads (Beckman Coulter Genomics) to remove primer and adapter dimers and other amplification artifacts, quantified with the Quant-iT™ PicoGreen® dsDNA Assay Kit (Thermo Fisher Scientific) and pooled equimolarly. The generated multiplex amplicon libraries can then be sequenced on any Illumina sequencing platform.

2

Materials

2.1 Multiplex PCR Amplification

1. aDNA template (see Note 1). 2. AmpliTaq Gold DNA polymerase kit (Thermo Fisher Scientific) including 5 U/μl enzyme, GeneAmp 10 Buffer II, and 25 mM MgCl2 solution. 3. 10 mg/ml bovine serum albumin (BSA). 4. dNTP mix of 25 mM each dATP, dCTP, dGTP, and dTTP (see Note 2). 5. Primer mix containing 1 μM of each primer (see Note 3). 6. HPLC grade water. 7. Filter tips and PCR reaction tubes/strips/plates (see Note 4). 8. Laminar flow cabinet (see Note 5). 9. Thermocycler.

Targeted PCR Amplification and Multiplex Sequencing

2.2 Preparation of Indexed Amplicon Libraries

143

1. AmpliTaq Gold DNA polymerase kit (Thermo Fisher Scientific) including 5 U/μl enzyme, GeneAmp 10 Buffer II, and 25 mM MgCl2 solution. 2. 10 mg/ml bovine serum albumin (BSA). 3. Barcode Library for Illumina Sequencers (Fluidigm). 4. dNTP mix of 25 mM each dATP, dCTP, dGTP, and dTTP. 5. HPLC grade water. 6. Filter tips and PCR reaction tubes/plates. 7. Thermocycler.

2.3 Library Purification and Quantification

1. Agencourt AMPure XP Reagent beads (Beckman Coulter Genomics). 2. 96-well magnetic plate (e.g., Beckman Coulter, Thermo Fisher Scientific). 3. Quant-iT™ PicoGreen® dsDNA Assay Kit (Thermo Fisher Scientific).

2.4 Illumina Sequencing

1. Qubit fluorometer (Thermo Fisher Scientific). 2. CS1, CS2, and their reverse complements as sequencing primers. 3. Illumina sequencing kits.

3 3.1

Methods Multiplex PCR

Setting up the multiplex PCR should be carried out in a specialized ancient DNA laboratory. All surfaces and equipment should be cleaned with bleach and ethanol to avoid contamination. Always include one PCR blank per eight to ten samples. 1. Prepare the primer mix by adding all forward and reverse primers. In the resulting solution, each primer should have a final concentration of 1 μM. 2. Prepare the multiplex master mix by adding all reagents except the DNA template (see Table 1). Prepare sufficient master mix for all samples and PCR blanks. 3. Distribute 16 μl of the master mix into PCR strips/plates and close all lids. 4. To avoid cross-contamination, add 4 μl DNA template to each reaction tube individually (open only one tube or row of tubes at a time). 5. After PCR setup, put the tubes/strips/plate in a thermocycler outside the ancient DNA lab, and run the amplification: initial step at 95  C for 10 min to activate the hot-start polymerase,

144

Saskia Wutke and Arne Ludwig

Table 1 Multiplex PCR master mix Reagent

Volume per reaction (μl)

Final concentration

HPLC grade water

Add up to 20 μl

AmpliTaq Gold PCR buffer II (10)

2

1

MgCl2 (25 mM)

2–3.2

2.5–4 mM

Bovine serum albumin (BSA) (10 mg/ml)

2

1 mg/ml

dATP, dCTP, dGTP, dTTP (25 mM each)

0.2

250 μM each

Primer mix (1 μM each)

3

150 nM each

AmpliTaq Gold (5 U/μl)

0.4

2U

DNA template

4

followed by 30 cycles of denaturation for 10 s 94  C, primer annealing for 30 s at the required annealing temperature, and elongation for 30 s at 72  C. The protocol ends with a final extension step at 72  C for 4 min (see Notes 2 and 6). 6. After amplification is finished, dilute the reactions 1:10–1:100 with HPLC grade water. This will be the template for the indexing PCR. 3.2

Indexing PCR

Again include one PCR blank per eight to ten samples, resulting in separate negative controls for the initial multiplex PCR and the indexing PCR. 1. Prepare the master mix by adding all reagents except the diluted multiplex PCR products and barcoded indexing primers (see Table 2). Prepare sufficient master mix for all samples and PCR blanks (see Notes 3 and 7). 2. Distribute 7 μl of the master mix into PCR strips/plates and close all lids. 3. Add 2 μl of individual barcode primer to each reaction tube (see Note 7). 4. Add 1 μl of diluted PCR product (from step 1) to each reaction tube individually. 5. After PCR setup, run the indexing PCR in a thermocycler: initial step at 95  C for 10 min to activate the hot-start polymerase, followed by 10 cycles of denaturation for 15 s at 95  C, primer annealing for 30 s at 60  C, and elongation for 60 s at 72  C. The protocol ends with a final extension step at 72  C for 4 min.

Targeted PCR Amplification and Multiplex Sequencing

145

Table 2 Indexing PCR master mix Reagent

Volume per reaction (μl)

HPLC grade water

Add up to 10 μl

AmpliTaq Gold PCR buffer II (10)

1

1

MgCl2 (25 mM)

0.5–1.5

1.25–3.75 mM

Bovine serum albumin (BSA) (10 mg/ml)

1

1 mg/ml

dNTPs (25 mM each)

0.1

250 μM each

Indexing primer (2 μM)

2

0.4 μM

AmpliTaq Gold (5 U/μl)

0.05

0.25 U

Template (diluted multiplex PCR product)

1

3.3 Library Purification and Quantification

Final concentration

1. Prepare the master mix by adding all reagents except the diluted multiplex PCR products and barcoded indexing primers (see Table 2). Prepare sufficient master mix for all samples and PCR blanks (see Notes 3 and 7). 2. Purify the reactions on a magnetic 96-well plate using the Agencourt AMPure XP Reagent beads according to manufacturer’s instructions. Elute and store DNA in 20 μl HPLC grade water (see Note 8). 3. Quantify the purified amplicon libraries with the Quant-iT™ PicoGreen® dsDNA quantification assay according to manufacturer’s instructions. 4. Pool all sample libraries using an equal amount of DNA.

3.4

Sequencing

1. Measure the DNA concentration of the pooled library on a Qubit fluorometer according to manufacturer’s instructions, and dilute to the required concentration. 2. Sequence purified and pooled amplicon libraries on Illumina sequencers using CS1, CS2, and their reverse complements (Table 3) as sequencing primers according to manufacturer’s instructions (see Note 9).

4

Notes 1. DNA can be extracted based on the protocol provided in Dabney et al. [18] and Chapter 3. 2. An additional measure to control for carry-over contamination by previous PCR products is the use of uracil-DNA glycosylase (UDG). For this purpose, 1 U of the heat-labile enzyme (Bioline) is added to the multiplex master mix, and an initial

146

Saskia Wutke and Arne Ludwig

Table 3 Target-specific (TS) primers with universal tags (CS1 forward, CS2 reverse) and indexing primers containing Illumina adapter sequences (PE1 ¼ P5, PE2 ¼ P7) and universal tag complements. Reverse primer contains a 10 bp barcode (BC) Primer

Sequence (50 –30 )

CS1-TS-F

ACACTGACGACATGGTTCTACA[TS-forward]

CS2-TS-R

TACGGTAGCAGAGACTTGGTCT[TS-reverse]

PE1-CS1

AATGATACGGCGACCACCGAGATCTACACTGACGACATGGTTCTACA

PE2-BC-CS2

CAAGCAGAAGACGGCATACGAGAT[BC]TACGGTAGCAGAGACTTGGTCT

incubation step at 37  C for 15 min is performed prior to the amplification. Also, dTTP is substituted with dUTP in all multiplex PCRs. If the reaction is contaminated by dUTPcontaining PCR products from earlier amplifications, these fragments are degraded in the initial incubation. After that, the UDG is thermally inactivated during amplification [19]. Furthermore, as uracil is a common deamination product of cytosine, UDG can be used to remove such products from the extracted DNA [20]. 3. For the initial multiplex PCR, target-specific (TS) primers were tagged with universal tags CS1 (forward tag) and CS2 (reverse tag). The primers used for the subsequent indexing PCR are composed of Illumina adapter sequences (PE1 corresponds to P5, PE2 corresponds to P7) as well as sequences complementary to the universal tags. Additionally, the reverse primer contains an individual 10 bp barcode (BC) sequence. 4. All plastic consumables should be sterile and DNA- and DNAse-free. 5. The laboratory dedicated to working with ancient DNA should be equipped with several work spaces/laminar flow cabinets employed for different purposes, e.g., one for DNA extraction and one for PCR setup. 6. The specified cycling conditions are optimized for AmpliTaq Gold polymerase (Thermo Fisher Scientific). If a different polymerase is used, these conditions should be adjusted according to the manufacturer’s recommendations. 7. The Barcode Library for Illumina Sequencers is provided on four separate 96-well plates. 8. For library purification with Agencourt AMPure XP, PCR purification system (Beckman Coulter Genomics) with a 1.8fold ratio of SPRI beads relative to the reaction volume was used. 9. Instructions for applying custom-specific sequencing primers are provided by Illumina and Fluidigm, respectively.

Targeted PCR Amplification and Multiplex Sequencing

147

Acknowledgments Thanks to Johanna Paijmans for the opportunity to contribute to this special edition and to the Deutsche Forschungsgemeinschaft (DFG) for financial support. References 1. Hofreiter M, Serre D, Poinar HN et al (2001) Ancient DNA. Nat Rev Genet 2(5):353–359 2. P€a€abo S, Poinar H, Serre D et al (2004) Genetic analyses from ancient DNA. Annu Rev Genet 38(1):645–679 3. Morin PA, Mccarthy M (2007) Highly accurate SNP genotyping from historical and low-quality samples. Mol Ecol Notes 7 (6):937–946 4. Narum SR, Buerkle CA, Davey JW et al (2013) Genotyping-by-sequencing in ecological and conservation genomics. Mol Ecol 22 (11):2841–2847 5. Cronin MA, Ca´novas A, Bannasch DL et al (2014) Single nucleotide polymorphism (SNP) variation of wolves (Canis lupus) in Southeast Alaska and comparison with wolves, dogs, and coyotes in North America. J Hered. https://doi.org/10.1093/jhered/esu075 6. Ellegren H (2014) Genome sequencing and population genomics in non-model organisms. Trends Ecol Evol 29(1):51–63 7. Svensson EM, Anderung C, Baubliene J et al (2007) Tracing genetic change over time using nuclear SNPs in ancient and modern cattle. Anim Genet 38(4):378–383 8. Svensson EM, Telldahl Y, Sjo¨ling E et al (2012) Coat colour and sex identification in horses from Iron Age Sweden. Ann Anat 194 (1):82–87 9. Bouakaze C, Keyser C, Crube´zy E et al (2009) Pigment phenotype and biogeographical ancestry from ancient skeletal remains: inferences from multiplexed autosomal SNP analysis. Int J Legal Med 123(4):315–325 10. Pruvost M, Reissmann M, Benecke N et al (2012) From genes to phenotypes—evaluation of two methods for the SNP analysis in archaeological remains: pyrosequencing and competitive allele specific PCR (KASPar). Ann Anat 194(1):74–81

11. Fortes GG, Speller CF, Hofreiter M et al (2013) Phenotypes from ancient DNA: approaches, insights and prospects. BioEssays 35(8):690–695 12. Hofreiter M, Paijmans JLA, Goodchild H et al (2015) The future of ancient DNA: technical advances and conceptual shifts. BioEssays 37 (3):284–293 13. Stiller M, Knapp M, Stenzel U et al (2009) Direct multiplex sequencing (DMPS)—a novel method for targeted high-throughput sequencing of ancient and highly degraded DNA. Genome Res 19:1943–1848 14. Wutke S et al (2016) The origin of ambling horses. Curr Biol 26(15):R697–R699 15. Wutke S et al (2016) Spotted phenotypes lost attractiveness in the Middle Ages. Sci Rep 6:38548 16. Dabney J, Knapp M, Glocke I et al (2013) Complete mitochondrial genome sequence of a Middle Pleistocene cave bear reconstructed from ultrashort DNA fragments. Proc Natl Acad Sci 10(39):15758–15763 17. Longo MC, Berninger MS, Hartley JL (1990) Use of uracil DNA glycosylase to control carryover contamination in polymerase chain reactions. Gene 93(1):125–128 18. Willerslev E, Cooper A (2005) Ancient DNA. Proc R Soc London B 272(1558):3–16 19. Wutke S, Sandoval-Castellanos E, Benecke N, Do¨hle H-J, Friederich S, Gonzalez J, Hofreiter ˜ ugas L, Magnell O, Malaspinas A-S, M, Lo ˜ iz A, Orlando L, Reissmann M, Morales-Mun Trinks A, Ludwig A (2018) Decline of genetic diversity in ancient domestic stallions in Europe. Sci Adv 4(4):eaap9691 20. Sandoval-Castellanos E, Wutke S, GonzalezSalazar C, Ludwig A (2017) Coat colour adaptation of post-glacial horses to increasing forest vegetation. Nat Ecol Evol 1(12):1816–1819

Chapter 16 Targeted Amplification and Sequencing of Ancient Environmental and Sedimentary DNA Ruth V. Nichols, Emily Curd, Peter D. Heintzman, and Beth Shapiro Abstract All organisms release their DNA into the environment through processes such as excretion and the senescence of tissues and limbs. This DNA, often referred to as environmental DNA (eDNA) or sedimentary ancient DNA (sedaDNA), can be recovered from both present-day and ancient soils, fecal samples, bodies of water and lake cores, and even air. While eDNA is a potentially useful record of past and present biodiversity, several challenges complicate data generation and interpretation of results. Most importantly, eDNA samples tend to be highly taxonomically mixed, and the target organism or group of organisms may be present at very low abundance within this mixture. To overcome this challenge, enrichment approaches are often used to target specific taxa of interest. Here, we describe a protocol to amplify metabarcodes or short, variable loci that identify lineages within broad taxonomic groups (e.g., plants, mammals), using the polymerase chain reaction (PCR) with established generic “barcode” primers. We also provide a catalog of animal and plant barcode primers that, because they target relatively short fragments of DNA, are potentially suitable for use with degraded DNA. Key words Ancient DNA, Universal primers, Environmental DNA, sedaDNA, Metabarcoding, Amplicon sequencing

1

Introduction Environmental DNA (eDNA) is generally defined as DNA deposited by an organism into its environment. It comes from a variety of sources, including water [1], feathers [2], feces [3], soil [4], and even residual saliva left on food items [5, 6]. It is a powerful tool as it can potentially replace classical methods which are timeconsuming and field intensive, such as field surveys, which require the use of highly trained specialists. Consequently, its use has become increasingly important for environmental monitoring and conservation [7]. With eDNA one simply needs to extract the DNA, sequence it, and match it to DNA databases. In addition, eDNA can be more sensitive than classical methods, such as potentially alerting environmental managers to the presence of an

Beth Shapiro et al. (eds.), Ancient DNA: Methods and Protocols, Methods in Molecular Biology, vol. 1963, https://doi.org/10.1007/978-1-4939-9176-1_16, © Springer Science+Business Media, LLC, part of Springer Nature 2019

149

150

Ruth V. Nichols et al.

invasive species before it is identified by the eye [1]. Sedimentary ancient DNA (sedaDNA) can be found in permafrost, sediments, and coprolites, and often the protocols used for eDNA can be used for sedaDNA samples as well, because both tend to have problems with respect to DNA fragmentation and preservation and the presence of inhibitors. The field of sedaDNA is currently undergoing an expansion, with new approaches having been recently developed. This includes direct shotgun sequencing [8, 9] and targeted capture enrichment [10] of sedaDNA extracts and genome skimming for cheaply expanding reference databases of multicopy genomic markers [11]. As these methods have yet to be widely implemented and refined, we focus here on the more traditional approach of metabarcoding, which is the targeted amplification of loci using universal or generic/conserved primers. Targeted amplification using the polymerase chain reaction (PCR) is often used in sedaDNA research because it can take single copies of DNA template and create many more copies, amplifying them to be concentrations detectable via sequencing. Generic conserved primers allow for the amplification of “barcode” loci across broad taxonomic groups (Table 1). Amplicon length is an important consideration because the majority of ancient DNA molecules are highly fragmented and are often less than 100 base pairs in length. PCR inhibitors are often co-extracted alongside sedaDNA, which can cause the PCR to fail. Serum albumins (e.g., bovine serum albumin) can bind to these inhibitors, rendering them inert, and are often added to PCR mixtures to prevent inhibition. These additives are common when an initial test PCR exhibits inhibition. Inhibition can be easily detected using either qPCR, whereby the qPCR curve shows low amplification efficiency, or spiking the sedaDNA extract into a positive PCR control, which should prevent amplification of the positive control. Another approach to dealing with PCR inhibitors is to dilute the DNA extract using a DNA elution buffer or sterile water, although this also dilutes the DNA template. It is important to note that levels of inhibition can be unique to each sample and thus optimization may be required for each sample. After successful PCR amplification, which can be determined by gel electrophoresis, the resulting amplicons need to be sequenced. High-throughput methods are ideal for simultaneously sequencing the thousands of amplicons often necessary to reveal the diversity of amplicons present. For this, amplicons first need to be converted to DNA library molecules. The primary aim of this library preparation is to add sequences, termed adapters, to the ends of the amplicons, so that they will be recognized by the sequencing platform. There are a multitude of different ways to prepare libraries, and adapter sequences vary depending on the sequencing platform, such as Illumina or Ion Torrent, to be used.

Locus

GCTGCACTTTGACTTGAC

CAAGAAGACCCTATAGAGC TT

ATTCGGTTGGGGCGACC

ACACCGCCCGTCACTCT

ACACCGCCCGTCACCCT

12S (mt)

16S (mt)

16S (mt)

16S (mt)

COI (mt)

12S (mt)

12S (mt)

12S (mt)

Enchytraeidae (oligochaete worm family)

Lumbricidae (earthworm family)

Lumbricidae (earthworm family)

Coleoptera (beetles)

Insects

Vertebrates

Teleostei (teleost fish)

Batrachia (frogs and salamanders)

ACTGGGATTAGATACCCC

GTAAAGTAAGCTCGTGTA TC

TGCAAAGGTAGCATAATMA TTAG

GATTCAGGGAAACTTAGG TTG

trnL-P6 loop (cp)

GGGCAATCCTGAGCCAA

Forward primer

Bryophytes (nonvascular plants)

Tracheophytes trnL-P6 loop (vascular plants) (cp)

Taxonomic group

N/A

N/A

N/A

N/A

Human blocking primer

GTAYACTTACCATG TTACGACTT

CTTCCGGTACACTTACCA TG

TAGAACAGGCTCCTCTAG

TTATGCTATATTANCTA TTGG

TCACCCTCCTCAAGTATAC TTCAAAGGCA-SPC3I

ACCCTCCTCAAGTATACT TCAAAGGAC-SPC3I

N/A

TCCATAGGGTCTTCTCGTC N/A

CTGTTATCCCTAAGGTAGC N/A TT

GGTCGCCCCAACCGAAT

AGCCTGTGTACTGCTGTC

CCATTGAGTCTCTGCACC

CCATTGAGTCTCTGCACC TATC

Reverse primer

Table 1 Primer pairs suitable for sedaDNA analyses (amplicon length < 150 bp)

58

56

51

52

62–105

93–115

55

55

132–153 52

96

136–151 55

(continued)

[17]

[17]

[16]

[15]

[13]

[14]

[14]

[13]

[13]

50–55 [12]

111–114 58

68–71

69–90

82–119

59–122

Amplicon length (bp) Ta ( C) Reference

Metabarcoding of Ancient Environmental DNA 151

16S (mt)

16S (mt)

12S (mt)

16S (mt)

16S (mt)

Aves (birds)

Aves (birds)

Aves (birds)

Mammals

Mammals

GCTGTTATCCCTAGGG TAACT

GTTTTAAGCGTTTGTGC TCG

TCCCTGGGGTAGCTTGG TCCAT

TCCAAGGTCG CCCCAACCGAA

Reverse primer

CGAGAAGACCCTATGGAGC CCGAGGTCRCCCCAACC T

CGGTTGGGGTGACC TCGGA

GATTAGATACCCCACTA TGC

CCTTGGAGAAA AACAAANCCTCCAAA

CATAAGACGAG AAGACCCTGTGGA

Forward primer

58

52

52

126–138 55

86–96

c. 120

c. 125

GGAGCTTTAATTTATTAATG 108–128 50 CAAACAGTACCC-SPC3I

CGGTTGGGGCGACCTCG GAGCAGAACCC-SPC3I

AGACCCTATGGAGCTTTAA TTTATTAATGCAAACSPC3I

Human blocking primer

[21]

[19, 20]

[13]

[18]

[18]

Amplicon length (bp) Ta ( C) Reference

Amplicon lengths include primer sequence. Ta annealing temperature, cp chloroplast DNA, mt mitochondrial DNA, SPC3I C3-spacer. The mammalian primer set of [21] is comparable to that presented in [22]

Locus

Taxonomic group

Table 1 (continued)

152 Ruth V. Nichols et al.

Metabarcoding of Ancient Environmental DNA

153

Here, we use the Illumina TruSeq adapters (although see Note 1), which allow for sequencing on Illumina instruments, such as the MiSeq. One library preparation approach is to append the adapters to the 50 ends of the barcoding primers, so that the adapters are incorporated during the metabarcoding PCR. The final step involves an indexing PCR whereby each sample is given its own unique index sequence to differentiate from the others on the sequencing run (see Note 2). We call this the two-step library prep method.

2

Materials All reagents and plastics should be sterile, DNA, and DNase free. All solutions should be molecular biology grade or similar.

2.1

Primer Selection

1. Generic primers should anneal to conserve regions of a locus that flank a short, variable region that can be used to delimit taxa at the genus or, ideally, species level. 2. Due to the degraded nature of ancient DNA, molecules are often shorter than 100 base pairs (bp), and so total amplicon length, which includes the primer lengths, should be of a comparable length or shorter. 3. Primers often target multicopy loci, such as those within the mitochondrial or chloroplast genomes, to maximize the amount of template available for amplification. 4. Primers can be tagged, whereby a unique short sequence (7–8 bp) is added to the 50 end of the forward and reverse primers (see Note 3). 5. A selection of previously published generic primers for targeting multicellular groups, such as vascular plants, insects, birds, and mammals, is presented in Table 1. 6. For mammalian primers, it is critical to also use a human blocking primer to reduce the amplification of background human contamination [20].

2.2

PCR Reagents

PCR recipes either need to be mixed by the researcher or are provided as ready-made master mixes. Here, we present an example setup using a ready-made master mix, as recommended by [23]. 1. Multiplex PCR master mix (Qiagen). This mix includes a modified Taq DNA polymerase, deoxynucleotide triphosphates (dNTPs), magnesium ions, and a multiplex PCR buffer. See also Note 4 for alternatives. 2. Forward and reverse primers (10 μM each). Primers can have adapter sequences attached if using a two-step library preparation protocol (see below; Table 2; see also Note 5).

154

Ruth V. Nichols et al.

Table 2 Primer sequences that are required for the two-step library preparation of metabarcoding-derived amplicons Oligo

Sequence

P5_2-step

ACACTCTTTCCCTACACGACGCTCTTCCGATCT [Forward primer]

P7_2-step

GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT [Reverse primer]

P5_indexing_oligo

AATGATACGGCGACCACCGAGATCTACACNNNNNNNACACTC TTTCCCTACACGACGCTCTTCCGATCT

P7_indexing_oligo

CAAGCAGAAGACGGCATACGAGATNNNNNNNGTGACTGGAG TTCAGACGTGTGCTCTTCCGATC

The nucleotides highlighted in bold in the P5_indexing_oligo are only required for dual indexing (see Note 2). Oligonucleotide sequences are © 2019 Illumina, Inc. All rights reserved. Derivative works created by Illumina customers are authorized for use with Illumina instruments and products only. All other uses are strictly prohibited

3. Blocking primer (50 μM). This is optional. 4. Water. 5. DNA template. 6. Barrier or filter pipette tips. 7. PCR reaction tubes or plates and lids. 8. Thermocycler with heated lid. 2.3 Gel Electrophoresis

1. Agarose gel (2%). 2. 50 TAE (Tris-acetate-EDTA, pH 8.3). Diluted with water to 1 for running buffer. 3. 6 loading dye. 4. 50 bp or 100 bp DNA ladder. 5. Agarose gel electrophoresis rig and power supply. 6. Nontoxic nucleic acid gel stain. 7. UV transilluminator. 8. Microwave.

2.4 PCR Product Purification

Amplicon-containing PCR products can be purified using commercial kits (such as Qiagen MinElute) or with Solid Phase Reversible Immobilization (SPRI) beads. On larger scales, SPRI beads may be more time and cost-efficient, and we present this protocol below. The required reagents are: 1. SPRI beads (e.g., Agencourt AMPure XP) (see Note 6). 2. 80% ethanol. Freshly prepared. 3. Elution buffer. This can be water or TE (Tris-HCl-EDTA), pH 8.0 buffer.

Metabarcoding of Ancient Environmental DNA

155

4. PCR plates and lids. 5. Vortexer. 6. Centrifuge. 7. Magnetic stand for 96-well plates. 2.5 Two-Step Library Indexing

The final indexing PCR step requires the following reagents: 1. 2 HiFi HotStart ReadyMix (Kapa Biosystems). 2. P5 and P7 indexing primers (10 μM each; Table 2). See Note 2 regarding single and dual indexing. 3. Water. 4. Purified metabarcoding PCR products. 5. Barrier or filter pipette tips. 6. PCR reaction tubes or plates and lids. 7. Thermocycler with heated lid.

2.6 Library Quantification

1. Qubit fluorometer (Invitrogen). 2. Qubit High Sensitivity DNA assay kit. 3. Tubes for Qubit assays. 4. Pipette tips.

3

Methods

3.1 PCR Amplification

1. Make a PCR master mix, consisting of all reagents except for the DNA template (Table 3; although see Note 3 if using tagged primers). Calculate the volumes required by multiplying the values in the last column of Table 3 by the number of reactions desired, plus three extra for pipetting error.

Table 3 An example metabarcoding PCR recipe

Reagent

Stock concentration

Final concentration

Volume for a single reaction (μL)

Qiagen multiplex PCR master mix

2

1

12.5

Forward primer

10 μM

0.2 μM

0.5

Reverse primer

10 μM

0.2 μM

0.5

Water Template Total

10.5 1 25

This example PCR recipe does not include a blocking primer. If this is required, subtract 2.5 μL of water, and replace with the blocking oligo (final concentration: 1 μM). The ratio of blocking primer: generic primer used here is an optimal ratio of 5:1 (following [20] and [17])

156

Ruth V. Nichols et al.

Table 4 The cycling parameters required for the metabarcoding PCR described in Table 3 Steps

Temperature ( C)

Time

1. Enzyme activation

95

15 min

2. Template denaturation

94

30 s

3. Primer annealing

See Table 1

90 s

4. Strand extension

72

60 s

5. Final extension

72

5 min

6. Temporary incubation

12

1

Steps 2–4 should be repeated for a total of 45 cycles (although see Note 8)

2. After making the master mix, pipette 24 μL of the master mix into each well of the PCR tubes or plate. 3. Then pipette 1 μL of DNA template into each well (see Note 7). 4. Ensure that the PCR tube or plate lids are properly closed. 5. Program the thermocycler with the cycling conditions listed in Table 4. See Notes 8 and 9 regarding cycle number and the use of touchdown PCR for degenerate primers. 6. Store the PCR products at 4  C. 3.2 Gel Electrophoresis

1. Heat a 2% agarose gel mix in a microwave until the agarose has fully dissolved or the gel has fully melted. 2. Allow to cool for 5 min. 3. Add the nontoxic nucleic acid gel stain. Gently and thoroughly mix. 4. Pour the melted agarose into a cast with a comb and allow to cool and solidify into a gel. 5. Move the gel to the electrophoresis rig and cover with 1 TAE. Make sure the gel is orientated correctly; DNA is negatively charged and so will travel toward the positively charged anode. 6. Mix 5 μL of each PCR product from Subheading 3.1, step 6 with 1 μL of loading dye. Load the PCR product and loading dye mix into the gel wells carefully. Leave several empty wells to add the DNA ladder, which should be added last. 7. Connect the leads to the gel box, making sure that the DNA will run in the correct direction (down the gel). Set the desired voltage and start the gel box. Check your gel after 30 min, and run for longer if necessary.

Metabarcoding of Ancient Environmental DNA

157

8. After running the gel, visualize it using a UV transilluminator. Do this to check that the PCR was successful and that the PCR product is of the expected length based on comparison to the DNA ladder. 3.3 PCR Product Purification

1. Prepare a fresh batch of 80% ethanol, by diluting absolute ethanol with water. The final volume needs to be ~500 μL per PCR product. 2. Add a volume of SPRI beads that is 3 that of the remaining PCR product from Subheading 3.1, step 6 (although see Note 10), reseal with lids, vortex to mix, and let stand for 15 min at room temperature. 3. Briefly centrifuge the PCR plate at 2000  g to collect the liquid at the bottom of the wells. 4. Place the plate on a 96-well plate magnetic stand, discard the lids, and let stand for 2–3 min to separate the beads from the supernatant. 5. Remove and discard the supernatant without disturbing the beads. 6. While leaving the plate on the magnetic stand, wash the beads by adding 190 μL of the freshly prepared 80% ethanol. 7. Let stand for 1 min and remove the supernatant. 8. Repeat the previous two steps. 9. Remove any residual ethanol, and let the beads air-dry for no more than 15 min at room temperature (see Note 11). 10. Add 20 μL of elution buffer to the wells and seal the plate with lids. 11. Remove the plate from the magnetic rack, and resuspend the beads by vortexing. 12. Let stand for 1 min, and then briefly centrifuge the plate, as described above. 13. Place the plate back on the magnetic stand, let stand for 1 min, and transfer the supernatant (the purified PCR product) to a new 96-well plate.

3.4 Two-Step Amplicon Library Indexing

For the two-step protocol, the 30 ends of the adapters will have been added to the amplicons during the metabarcoding PCR (Subheading 3.1, step 1). The full-length adapters, including index sequences, are then added by the following indexing PCR. 1. Make a master mix according to Table 5. Calculate the total volumes needed for the number of reactions you are setting up, plus three for pipetting error. This master mix should only include the water and Kapa HiFi HotStart ReadyMix but can

158

Ruth V. Nichols et al.

Table 5 An indexing PCR recipe for the second step of the two-step library preparation protocol

Reagent

Stock concentration

Final concentration

Volume for a single reaction (μL)

Kapa HiFi HotStart ReadyMix

2

1

12.5

Water

6.5

P5 indexing primer

10 μM

0.2 μM

0.5

P7 indexing primer

10 μM

0.2 μM

0.5

Template (amplicons with adapters attached)

5

Total

25

Indexing primers are listed in Table 2

Table 6 The cycling parameters required for the indexing PCR described in Table 5 Steps

Temperature ( C)

Time

1. Enzyme activation

95

3 min

2. Template denaturation

98

20 s

3. Primer annealing

65

30 s

4. Strand extension

72

45 s

5. Final extension

72

1 min

6. Temporary incubation

4

1

Steps 2–4 should be repeated for a total of 8 cycles

also include the P5 indexing primer if this does not include an index (see Table 2 and Note 2). 2. Pipette 19 (or 19.5; see above) μL of the master mix into each PCR tube or plate well. 3. Pipette each index separately into each PCR tube or plate well. 4. Pipette the template PCR product with adapters into each PCR tube or plate well. 5. Ensure that the PCR tube or plate lids are properly closed. 6. Program the thermocycler with the cycling conditions listed in (Table 6). Cycle through template denaturation, primer annealing, and strand extension (Table 6, steps 2–4) for a total of eight cycles. 7. Store the PCR products at 4  C.

Metabarcoding of Ancient Environmental DNA

3.5 Amplicon Library Purification and Quantification

159

1. For each library, perform a SPRI bead purification, based on the method in Subheading 3.3. Make sure to adjust the SPRI bead to library volume ratio to account for the increased adapter length (total adapter length, including indexes, is ~130 bp; see Note 10). 2. Quantify the libraries using the Qubit High Sensitivity DNA assay and Qubit Fluorometer, following the manufacturer’s instructions. 3. The amplicon libraries are now ready for downstream pooling and sequencing.

4

Notes 1. The Illumina TruSeq adapter sequences are used in this protocol. However, Illumina also offers the Nextera adapters, which may be better suited to some applications. 2. Dual indexing, whereby both the P5 and P7 adapters contain an index, can allow for greater multiplexing of libraries for sequencing. As both indexes need to be read correctly during demultiplexing, dual indexing also reduces the chance of sequence reads being misassigned to the wrong library, which can occur by index swapping. 3. Primers may be tagged with seven or eight unique nucleotides at the 50 end to distinguish sequences produced from different PCR reactions (e.g., [24]). The advantage of this approach is that PCR products from many reactions may be pooled prior to library preparation. The disadvantage is that PCR efficiency may be affected and tag jumping may artificially inflate diversity estimates [25]. However, if both the forward and reverse primer are uniquely tagged (both tags are only used once in a pool), then the potential incidence of tag misassignment is greatly reduced. If using tagged primers, add these separately to the PCR so as not to include them in the master mix. 4. Two other polymerases that are commonly used in metabarcoding studies are Platinum HiFi Taq (Invitrogen) and AmpliTaq Gold (Applied Biosystems). Platinum HiFi is the most expensive but yields the fewest polymerase errors, the Qiagen Multiplex PCR master mix amplifies mixed templates most evenly [23], and AmpliTaq Gold is the most affordable. 5. Ligation-based library preparation approaches, including PCR-free methods, may be preferable when using tagged primers (see Note 3). In these cases, do not include the adapter sequence in the forward and reverse metabarcoding primers.

160

Ruth V. Nichols et al.

6. SPRI beads can also be ready-made in house using the Rohland and Reich protocol [26]. It is important to test any new batches of SPRI beads prior to use (see Note 10). 7. The volume of DNA template can be increased by reducing the volume of water by a corresponding amount. Increased DNA template may be required for extracts with poor DNA preservation, but it is advised that the DNA template volume not exceed 3 μL in a 25 μL reaction volume. 8. The number of PCR cycles can be adjusted based on DNA template concentration, but 40–50 cycles are generally used for the amplification of sedaDNA. 9. For primers that contain degenerate bases, it may be advisable to perform a touchdown PCR. In this setup, an initially high annealing temperature is gradually decreased to the optimal annealing temperature during the first 10–20 PCR cycles. This should reduce the signal from off-target amplicons. 10. The volume of SPRI beads to volume of PCR product ratio dictates the minimum length of DNA that will bind to the SPRI beads and is therefore recoverable after purification. The higher this ratio, the shorter this minimum length. We provide a conservative value of 3, but note that fewer SPRI beads could be used, depending on the amplicon fragment length. A SPRI bead ratio and DNA length cutoff chart should be provided with the SPRI beads. 11. SPRI beads may be dried more quickly by placing the plate in a desiccator or incubator at 37  C. The beads should only be dried until all residual ethanol traces have evaporated.

Acknowledgments P.D.H. acknowledges support from the Research Council of Norway (Grant 250963: “ECOGEN”). References 1. Jerde CL, Mahon AR, Chadderton WL, Lodge DM (2011) “Sight-unseen” detection of rare aquatic species using environmental DNA. Conserv Lett 4:150–157. https://doi.org/10. 1111/j.1755-263X.2010.00158.x 2. Taberlet P, Bouvet J (1991) A single plucked feather as a source of DNA for bird genetic studies. Auk 108:959–960 3. Soininen EM, Valentini A, Coissac E et al (2009) Analysing diet of small herbivores: the efficiency of DNA barcoding coupled with high-throughput pyrosequencing for

deciphering the composition of complex plant mixtures. Front Zool 6:16. https://doi.org/ 10.1186/1742-9994-6-16 4. Sønstebø JH, Gielly L, Brysting AK et al (2010) Using next-generation sequencing for molecular reconstruction of past Arctic vegetation and climate. Mol Ecol Resour 10:1009–1018. https://doi.org/10.1111/j. 1755-0998.2010.02855.x 5. Wheat RE, Allen JM, Miller SDL et al (2016) Environmental DNA from residual saliva for efficient noninvasive genetic monitoring of

Metabarcoding of Ancient Environmental DNA Brown Bears (Ursus arctos). PLoS One 11: e0165259. https://doi.org/10.1371/journal. pone.0165259 6. Nichols RV, Ko¨nigsson H, Danell K, Spong G (2012) Browsed twig environmental DNA: diagnostic PCR to identify ungulate species. Mol Ecol Resour 12:983–989. https://doi. org/10.1111/j.1755-0998.2012.03172.x 7. Bohmann K, Evans A, Gilbert MTP et al (2014) Environmental DNA for wildlife biology and biodiversity monitoring. Trends Ecol Evol 29:358–367. https://doi.org/10.1016/ j.tree.2014.04.003 8. Pedersen MW, Ruter A, Schweger C et al (2016) Postglacial viability and colonization in North America’s ice-free corridor. Nature 537:45–49. https://doi.org/10.1038/ nature19085 9. Graham RW, Belmecheri S, Choy K et al (2016) Timing and causes of mid-Holocene mammoth extinction on St. Paul Island, Alaska. Proc Natl Acad Sci U S A 113:9310–9314. https://doi.org/10.1073/ pnas.1604903113 10. Slon V, Hopfe C, Weiß CL et al (2017) Neandertal and Denisovan DNA from Pleistocene sediments. Science 356:605–608. https://doi. org/10.1126/science.aam9695 11. Coissac E, Hollingsworth PM, Lavergne S, Taberlet P (2016) From barcodes to genomes: extending the concept of DNA barcoding. Mol Ecol 25:1423–1428. https://doi.org/10. 1111/mec.13549 12. Taberlet P, Coissac E, Pompanon F et al (2007) Power and limitations of the chloroplast trnL (UAA) intron for plant DNA barcoding. Nucleic Acids Res 35:e14. https://doi.org/ 10.1093/nar/gkl938 13. Epp LS, Boessenkool S, Bellemain EP et al (2012) New environmental metabarcodes for analysing soil DNA: potential for studying past and present ecosystems. Mol Ecol 21:1821–1833. https://doi.org/10.1111/j. 1365-294X.2012.05537.x 14. Bienert F, De Danieli S, Miquel C et al (2012) Tracking earthworm communities from soil DNA. Mol Ecol 21:2017–2030. https://doi. org/10.1111/j.1365-294X.2011.05407.x 15. Willerslev E, Cappellini E, Boomsma W et al (2007) Ancient biomolecules from deep ice cores reveal a forested southern Greenland. Science 317:111–114. https://doi.org/10. 1126/science.1141758

161

16. Riaz T, Shehzad W, Viari A et al (2011) ecoPrimers: inference of new DNA barcode markers from whole genome sequence analysis. Nucleic Acids Res 39:e145. https://doi.org/ 10.1093/nar/gkr732 17. Valentini A, Taberlet P, Miaud C et al (2016) Next-generation monitoring of aquatic biodiversity using environmental DNA metabarcoding. Mol Ecol 25:929–942. https://doi.org/ 10.1111/mec.13428 18. Dale´n L, Lagerholm VK, Nylander JAA et al (2017) Identifying bird remains using ancient DNA barcoding. Genes (Basel). https://doi. org/10.3390/genes8060169 19. Taylor PG (1996) Reproducibility of ancient DNA sequences from extinct Pleistocene fauna. Mol Biol Evol 13(1):283–285 20. Boessenkool S, Epp LS, Haile J et al (2012) Blocking human contaminant DNA during PCR allows amplification of rare mammal species from sedimentary ancient DNA. Mol Ecol 21:1806–1815. https://doi.org/10.1111/j. 1365-294X.2011.05306.x 21. Giguet-Covex C, Pansu J, Arnaud F et al (2014) Long livestock farming history and human landscape shaping revealed by lake sediment DNA. Nat Commun 5:3211. https:// doi.org/10.1038/ncomms4211 22. Tillmar AO, Dell’Amico B, Welander J, Holmlund G (2013) A universal method for species identification of mammals utilizing next generation sequencing for the analysis of DNA mixtures. PLoS One 8:e83761. https://doi.org/ 10.1371/journal.pone.0083761 23. Nichols RV, Vollmers C, Newsom LA et al (2018) Minimizing polymerase biases in metabarcoding. Mol Ecol Res 18:927–939. https:// doi.org/10.1111/1755-0998.12895 24. Seersholm FV, Pedersen MW, Søe MJ et al (2016) DNA evidence of bowhead whale exploitation by Greenlandic Paleo-Inuit 4,000 years ago. Nat Commun 7:13389. https://doi. org/10.1038/ncomms13389 25. Schnell IB, Bohmann K, Gilbert MTP (2015) Tag jumps illuminated—reducing sequenceto-sample misidentifications in metabarcoding studies. Mol Ecol Resour 15:1289–1303. https://doi.org/10.1111/1755-0998.12402 26. Rohland N, Reich D (2012) Cost-effective, high-throughput DNA sequencing libraries for multiplexed target capture. Genome Res 22:939–946. https://doi.org/10.1101/gr. 128124.111

Chapter 17 Authentication and Assessment of Contamination in Ancient DNA Gabriel Renaud, Mikkel Schubert, Susanna Sawyer, and Ludovic Orlando Abstract Contamination from both present-day humans and postmortem microbial sources is a common challenge in ancient DNA studies. Here we present a suite of tools to assist in the assessment of contamination in ancient DNA data sets. These tools perform standard tests of authenticity of ancient DNA data including detecting the presence of postmortem damage signatures in sequence alignments and quantifying the amount of present-day human contamination. Key words Contamination, Ancient DNA, Postmortem damage, Schmutzi, DICE, mapDamage2.0

1

Introduction DNA extracted from preserved materials has enabled unprecedented insights into the history of humans [1, 2], animals [3–6], plants [7], and microbes [8–11]. This material, referred to as ancient DNA (aDNA), is obtained through DNA extraction from bones, teeth, and other materials [12–15]. Since DNA degrades over time after the death of an organism, aDNA is characterized by short fragment size (often less than 50 bp; [13]) and postmortem damage in the form of cytosine (C) to thymine (T) and the complementary guanine (G) to adenine (A) substitutions [16, 17]. After death, tissues are colonized by microbial decomposers, which can introduce microbial DNA contamination that in some cases represents >99% of recovered DNA [18]. Finally, as little DNA tends to be preserved and much of this DNA can belong to exogenous sources (except for materials such as petrosal bones, tooth cementum, and hair; [19]), even small amounts of presentday DNA contamination can overwhelm the original and endogenous DNA in molecular biology experiments, making contamination a significant challenge for aDNA studies.

Beth Shapiro et al. (eds.), Ancient DNA: Methods and Protocols, Methods in Molecular Biology, vol. 1963, https://doi.org/10.1007/978-1-4939-9176-1_17, © Springer Science+Business Media, LLC, part of Springer Nature 2019

163

164

Gabriel Renaud et al.

Contamination derived from living humans is particularly problematic in the study of ancient humans, due to the high genetic similarity between modern and ancient humans. Human contamination can be introduced to ancient human samples in several ways. During excavation, for example, bones and teeth are typically handled in non-sterile environments by bare hands (with notable exceptions, such as excavations at El Sidro´n Cave in Asturias, Spain; [20]). Bones and teeth are also sometimes cleaned by washing in water, which can contain skin flakes. This is a problem because hydroxyapatite, which is the main mineral component of bone, tooth enamel, and dentin, absorbs DNA in a liquid environment [21]. Ancient samples can also be contaminated during museum storage, both through touching and contact with other samples [22, 23]. Once in the laboratory, the risk of contamination can be reduced by working in a sterile environment. Researchers often remove or chemically treat the outer surface of bones and teeth, which can reduce surface contamination. Many reagents to be used in DNA extraction and genomic library preparation can be treated by UV radiation or with exonucleases [24]. Computational pipelines have been developed to detect the presence of contaminating DNA after DNA sequencing has been performed. In these, the more closely related the source of the aDNA is to the potential contaminating DNA, the more difficult it is to distinguish between the authentic and contaminating DNA. In archaic hominins such as Neanderthals, all mitochondrial genome sequences published so far fall outside the variation of modern humans [19, 25, 26], making present-day human contamination estimates achievable if mitochondrial coverage is sufficiently high [18, 27]. However, such estimates do not necessarily reflect the amount of nuclear DNA contamination [18]. To date, contamination estimates of nuclear DNA remain challenging in particular for ancient, anatomically modern humans, as ancient humans can fall within the variation of present-day humans [28–32]. The molecular footprint of postmortem DNA damage can be useful for differentiating between authentically ancient DNA and present-day contaminants [33]. In a living organism, damage to the DNA strands, including via hydrolysis and oxidation, is continuously repaired [34]. After death, however, DNA is left unrepaired, and damage accumulates in predictable patterns, the signal of which is a distinguishing characteristic of aDNA. The postmortem damage most commonly associated with aDNA is cytosine to uracil (or to thymine, if the cytosine is methylated) deamination [17]. This deamination is more likely to occur in the singlestranded overhangs of the fragmented aDNA [34] and results in a C to T (G to A) replacement signal at the 50 (30 ) ends of aDNA sequences [17]. Different nucleotide misincorporation patterns are observed depending on the molecular tools used during library

Authentication of Ancient DNA Data

165

preparation [35–37] and library amplification [35, 38]. Software tools such as mapDamage [39, 40] have been developed to automate detection of these signals. However, while observing the expected damage patterns in aDNA data sets is compatible with DNA that has originated from an ancient source, it does not rule out the possibility of contamination, as mixtures of aDNA templates and fresh modern contaminants can coexist and generate bona fide patterns [41–43]. As contamination can originate from many sources (e.g., crosscontamination, microbial colonization, modern human contamination), it is crucial to assess the authenticity of aDNA sequence data sets prior to using these data for downstream analyses. Here, we describe several tools that are useful in the assessment of data authenticity in ancient DNA data sets. We begin by discussing the basic prerequisites for performing these analyses. We then describe the automated PALEOMIX pipeline (Subheading 2). Next, we describe how the signatures of postmortem damage may be examined using mapDamage2.0 (Subheading 3). Finally, we discuss how the amount of contamination may be estimated for the mitochondrial genome using schmutzi (Subheading 4) and for the nuclear genome using ANGSD and DICE (Subheading 5).

2

Processing Data Using the PALEOMIX Pipeline To carry out the analyses described in the subsequent parts of this chapter, it is necessary to perform certain pre- and post-processing steps on the raw FASTQ data produced by the HTS instrument. For the purpose of this chapter, we will explain processing using the “BAM pipeline,” one of the pipelines included with the PALEOMIX suite of tools [44]. This pipeline implements both the pre-processing and the post-processing steps outlined in detail in the sections below and allows for easy mapping onto multiple reference sequences of interest. For detailed installation instructions, please refer to http://paleomix.readthedocs.io/. The steps can be summarized as follows: 1. Pre-processing of reads (Subheading 2.1): (a) Adapter sequences must be trimmed from short DNA inserts. (b) Overlapping paired-end reads may be merged while taking into account individual per-base quality scores. 2. During read mapping: (a) Reads must be mapped to the mitochondrial (organellar) and nuclear genome separately. 3. Post-processing of alignments (Subheading 2.3): (a) Mate information must be set for paired-end alignments.

166

Gabriel Renaud et al.

(b) The “MD” tag must be calculated for every alignment. (c) Alignments must be sorted by coordinate. (d) PCR duplicates must be filtered. 4. Implementation with PALEOMIX (Subheading 2.4). 2.1 Pre-processing of Reads

Prior to mapping, raw reads must be trimmed of any residual adapter sequences used by the platform as sequencing primers, in order to eliminate the possibility that any such sequences interfere with downstream alignments and genotyping. This is particularly important due to the short size of aDNA fragments, which are typically shorter than the read length used by Illumina sequencing machines and are consequently often terminated by platformspecific adapter sequences [45]. When performing paired-end sequencing, it is furthermore beneficial to merge overlapping portions of the read mates in order to reconstruct the full length of the original aDNA fragment. Merging read mates in a quality-aware manner also reduces the error rate in the resulting sequences [45]. These two steps of read trimming and merging may be combined using tools specialized for aDNA, including AdapterRemoval v2 [46] and leeHom [47], as well as general tools for adapter trimming [48–50] and read merging [51, 52]. The steps aim at reconstructing the sequence of the original aDNA fragment onto which Illumina-specific adapters were ligated and subsequently read by the sequencer.

2.2 Mapping to Reference Genomes

Once reads have been trimmed and (optionally) merged, mapping of the resulting sequences may be carried out using any of the numerous short-read alignment tools currently available (see [53]). These instructions make use of BWA [54], but any mapper may be substituted when carrying out this process (e.g., BWA-PSSM [55] and Bowtie2 [56]). When mitochondrial genomes are analyzed, it is important, regardless of the chosen mapper, that reads be mapped to the mitochondrial genome. That is to say that mapping must be carried out using only the mitochondrial genome as the target sequence, excluding any nuclear genome sequences. This is motivated by the presence of mitochondrial inserts in the nuclear genome (NUMTs), representing anything from small fragments to almost complete copies of the mitochondrial genome [57, 58]. The presence of these sequences may therefore result in both authentic and contaminant DNA sequences mapping to the nuclear genome, rather than to the mitochondrial reference sequence or in a loss of sequence information as nonunique hits are generally discarded in downstream analyses (see Subheading 2.3). This may hinder any attempts at calling a consensus sequence for the mitochondrial data due to the presence of gaps in the alignments (see Subheading 4).

Authentication of Ancient DNA Data

167

2.3 Post-processing of Alignments

Following sequence alignment, several additional steps have to be completed in order to prepare the results for the analyses described in subsequent sections. Incomplete mate information in paired-end alignments such as insert size and mate-related flags must be added. Second, the alignments must be sorted by genomic coordinates in order to allow indexing (e.g., using “samtools index”) and therefore efficient analyses of the data by the programs described below (see Note 1). Thirdly, the “MD” tag, containing information about the presence of mismatches relative to the reference sequence, must be calculated using the SAMtools “calmd” or “fillmd” commands [59], as this information is used both by schmutzi and DICE to infer the sequence of the reference from the alignment alone (see Subheading 4). In addition, alignments must be filtered for PCR duplicates, in particular samples with low DNA content, as duplicate sequences may distort results from subsequent analyses via overrepresentation of duplicated reads [60].

2.4 Implementation with PALEOMIX

Installation of PALEOMIX and its required Python modules can be completed by running the “pip” command as root: % pip install paleomix

In addition to PALEOMIX, the following software should also be installed: AdapterRemoval v2 [46], SAMtools [59], and BWA [54]. In addition, the Picard Tools “picard.jar” file must be downloaded and placed in ~/install/jar_root/. This software is used by the BAM pipeline, an automated pipeline for processing and mapping sequencing reads that is included as part of PALEOMIX. mapDamage2.0 [40] and the Genome Analysis Toolkit (GATK) [61] are optional components for the BAM pipeline, but we will not use these here. To start a new mapping project, create a default configuration file using output of the built-in “makefile” command in the PALEOMIX “BAM pipeline:” % bam_pipeline makefile > project.yaml

As indicated by the extension, the makefile is based on the YAML markup language (http://www.yaml.org/), a text format for organizing data that is both easy to read and easy to manipulate using any standard text editor. The following instructions will focus on mapping to the mitochondrial genome and assume that the mitochondrial genome is located in a read/writable directory at /genomes/mitochondria. fasta, but this may be substituted by any other path. To specify that the aDNA fragments should be mapped to this genome, open the “project.yaml” file in a text editor (vim, emacs, nano, sublime, etc.), and locate the section of the file starting with “Prefixes:.” There

168

Gabriel Renaud et al.

specify the name and location of the genome that we wish to map to (named “Mitochondria” in the following example): Prefixes: Mitochondria: Path: /genomes/mitochondria.fasta

Any number of reference genomes (called “Prefixes” in the pipeline), in the form of FASTA files containing one or more sequences, may be specified here (the .fasta extension is mandatory). Note that in YAML files, the indentation (none for the first line, one level for the second line, and two levels for the third line) is used to define the structure of the data. This indentation must consist purely of spaces, as the format prohibits the use of tabs for indentation. Next, specify the samples to be mapped by specifying five pieces of information for each sample: (1) The filename to use for output files generated by the pipeline, (2) the name of the sample, (3) the name of the library from which reads have been sequenced, (4) the name of the sequencing run to be processed, and (5) the location of the files containing the sequencing data. For example, assuming that we had downloaded the sample SRR123456 from the Sequence Read Archive and saved the pairedend FASTQ reads in files SRR123456_1.fastq.gz and SRR123456_2.fastq.gz, for mate 1 and mate 2 reads, respectively, these could be specified as follows, at the end of the configuration file: SRR123456: SRR123456: Library1: ERR123456: /path/to/SRR123456_{Pair}.fastq.gz

The “{Pair}” value is used to signify that this is paired-end data, and the pipeline will expect the mate 1 and mate 2 reads to be located at the path generated by replacing this value by 1 and 2, respectively (here /path/to/SRR123456_1.fastq.gz and / path/to/SRR123456_2.fastq.gz). In the case of single-end reads, if the FASTQ reads are stored in file /path/to/SRR123456.fastq. gz, for instance, this value would be omitted: SRR123456: SRR123456: Library1: ERR123456: /path/to/SRR123456.fastq.gz

Multiple libraries and/or runs can be specified for each sample, as described in the documentation for the BAM pipeline.

Authentication of Ancient DNA Data

169

Finally, locate the “Features” section and disable the use of “mapDamage” and the use of GATK, by replacing the “yes”-value for “mapDamage” and “RealignedBAM” with “no,” as shown here: Features: RealignedBAM: no mapDamage: no ...

Other options in the “Features” section are not shown here for the sake of brevity, and these should be left as is. Once this has been done, the pipeline can be run using the “bam_pipeline run” command: % bam_pipeline run project.yaml

Running the BAM pipeline in this manner will: 1. Trim remaining adapter sequences and merge overlapping reads (see Note 2). 2. Map the trimmed reads onto the listed reference sequences. (a) If these references have not already been indexed, indexing is performed using the selected short-read aligner (here BWA). 3. Remove any reads not mapping to the reference genome (see Note 3). 4. Process mapped reads using the SAMtools “calmd” and “fixmate” commands, to ensure that mate information is correct and to ensure that the “MD” is present. 5. Remove reads identified as PCR duplicates (here using the Picard “MarkDuplicates” command). 6. Index the BAM for fast retrieval of arbitrary regions. Based on the settings listed above, the resulting BAM file, containing alignments against the mitochondrial genome, will be located in the current directory with the filename “SRR123456. mitochondria.bam.” This file is suitable for analysis following the instructions in Subheading 4.2 aiming at quantifying present-day human contamination from mitochondrial data and for obtaining a consensus call for the endogenous mitochondrial genome. A BAM file aligned to the nuclear genome using the same methodology can be used for the analyses described in Subheading 5 to quantify present-day human contamination for the nuclear genome.

170

3

Gabriel Renaud et al.

Estimating Damage Parameters and Fragmentation Patterns Postmortem damage patterns are routinely used in assessing aDNA authenticity. Here we discuss mapDamage2.0 [40], which offers several tools for visualizing and modeling the patterns of postmortem damage observed in ancient samples. mapDamage2.0 also allows users to recalculate base quality scores to mitigate the impact of postmortem damage on downstream analyses. In this section, we focus on the basic plotting of error rates and fragmentation patterns in a BAM file. The modeling and plotting of postmortem DNA damage parameters and the rescaling of base quality scores are not required for the purpose of authenticating aDNA. Basic plotting may be performed on any BAM file (as described in Subheading 2) and makes no assumption about the presence or absence of postmortem DNA damage in the sample. It is recommended to start any analyses of aDNA by performing a basic mapDamage plot; this not only serves to determine if aDNA is present but may also help detect systematic sequencing errors and biases through visual inspection of the error and fragmentation plots (see Notes 4–7). Note that most of these plots are not as useful for extracts that have been treated with a “USER” treatment: a combination of endonuclease VIII and uracil-DNA glycosylase (UDG), as this treatment serves to erase the signature of postmortem damage [30, 62] (see Note 8).

3.1 Postmortem Damage and Fragmentation Plots

Before proceeding, install mapDamage2.0 and its dependencies as described at http://ginolhac.github.io/mapDamage/. To carry out basic plotting using mapDamage2.0, two files are required: a reference sequence in FASTA format and a BAM file containing reads mapped (alignments) to that reference sequence. The BAM file must fulfill the requirements described in Subheading 2, with the exception of the “MD” tag information, which is not required. Once these requirements are met, basic plotting may be carried out using the following command (shown here for the example BAM produced in Subheading 2): % mapDamage -r /genomes/mitochondria.fasta -i SRR123456.mitochondria.bam --no-stats

The --no-stats option is required to disable the modeling of postmortem DNA damage parameters, which is otherwise carried out by default. Running this command will create a folder named results_SRR123456.mitochondria/ in which the output files are placed (alternatively, this destination may be set using the -d command line option). Note that enabling the mapDamage feature by manually editing the Features section of the Paleomix makefile would result in running the same analysis:

Authentication of Ancient DNA Data

171

Features: RealignedBAM: no mapDamage: yes ...

% bam_pipeline run project.yaml

The primary outputs are the plots “Fragmisincorporation_plot.pdf” and “Length_plot.pdf.” The first of these, the “Fragmisincorporation_plot.pdf” file, plots the base composition around the 50 and 30 termini of aligned DNA sequences, as well as the rates of C to T and G to A mismatches observed relative to the reference genome across the alignments (see Note 6). 3.1.1 Postmortem Damage Plots

For aDNA libraries produced using blunt-end adapter ligation to double-stranded DNA templates and single-stranded fill-in following end-repair [63], this plot is expected to show an increase in C to T and G to A mismatches when approaching the 50 and 30 termini, respectively (Fig. 1b). Modern DNA, on the other hand, including contamination resulting from the handling of samples and/or introduced during library preparation, is expected to show C to T and G to A mismatches in line with other substitution rates (Fig. 2a). As such, the presence of postmortem damage provides powerful evidence of the presence of aDNA in the sample [64]. This signal may, however, be influenced by the choice of library building protocol used during the sequencing of the ancient sample. In particular, the use of the A-tailing (AT) ligation protocol is known to reduce the rate of postmortem DNA damage observed at the first and the last position in aDNA fragments (Fig. 1c) [35], due to ligation bias against templates starting with thymine analogs, such as uracil. G to A substitutions at the 30 termini of aDNA fragments are a product of the end-repair step, in which the missing DNA segments complementary to 50 overhangs are repaired, introducing adenines when facing uracils (instead of guanines at non-deaminated cytosines). However, for protocols targeting individual strands of aDNA and not involving 30 -end fill-in reactions (such as the single-stranded library preparation method; [36]), only the 50 C to T pattern of postmortem DNA damage is observed. The 30 G to A pattern is then replaced by a 30 C to T pattern, almost symmetrical to that observed at 50 ends, as in absence of end-repair, 30 overhangs, which also accumulate uracils, are not removed. While diagnostic for the presence of aDNA, mismatches resulting from the presence of postmortem DNA damage are also detrimental to efforts to genotype ancient specimens, and “USER” treatment has been developed to eliminate the presence of such DNA damage in ancient samples, at the cost of some of the

172

Gabriel Renaud et al.

a) 0.10

0.10

−1

−2

−3

−4

−5

−6

−7

−8

−9

−10

10

9

8

7

6

5

0.00 4

0.02

0.00 3

0.02

2

0.04

1

0.06

0.04

0.12

SRR959264

0.10

0.12 0.10

−1

−2

−3

−4

−5

−6

−7

−8

−9

−10

10

9

c)

8

0.00 7

0.02

0.00 6

0.02

5

0.04

4

0.06

0.04

3

0.08

0.06

2

0.08

1

0.12

SRR959266

0.10

0.12 0.10

−1

−2

−3

−4

−5

−6

−7

−8

−9

−10

10

9

d)

8

0.00 7

0.02

0.00 6

0.02

5

0.04

4

0.06

0.04

3

0.08

0.06

2

0.08

0.12

Denisovan

0.10

−1

−2

−3

−4

−5

−6

−7

−8

−9

−10

10

9

8

0.00 7

0.02

0.00 6

0.02

5

0.04

4

0.06

0.04

3

0.08

0.06

1

0.08

2

Frequency

0.08

0.06

0.12

Frequency

0.10

C>T G>A

0.08

b)

Frequency

0.12

SRR959258

1

Frequency

0.12

Fig. 1 mapDamage2.0 misincorporation plots for modern DNA (a); aDNA sequenced using end-repair, bluntend adapter ligation, and nick fill-in (b); aDNA sequenced using end-repair, adapter ligation at AT-overhangs, and nick fill-in, characterized by a decrease in the observed postmortem DNA damage rate at the first and last position (c); and aDNA that has been USER-treated (d). Red represents the rate of C to T substitutions relative to the reference, blue represents the rate of G to A substitutions. Samples (a–c) were sourced from [35], sample (d) was sourced from [36]

Authentication of Ancient DNA Data

Genotype

Contamination

Endogenous: 0% contamination A

C

T

Observed aDNA fragments

A

C

A

C

A A

50% contamination

Contaminant:

100% contamination

C C

T

C

A

C

T A

C

173

C C

T

C

T

C

T

C

T

C

Fig. 2 Schematic representation of the problem of determining the haplotype of the endogenous mitochondrial genome in the presence of present-day human contamination for aDNA data sets. The left column represents the true haplotype of the endogenous and contaminant mitochondrial genomes. The rightmost column shows the observed aDNA fragments aligned to the mitochondrial reference

material, through the targeted excision of uracils [62, 65]. However, despite the elimination of uracils in the template molecules, the sequencing of ancient, USER-treated samples can still shows a slight signal of C to T at the 50 and 30 termini (Fig. 1d). This signal is mostly driven by the presence of methylated CpG epi-alleles in the sequence data [66–68]. It should be noted that contaminating sequences can also have appreciable amounts of postmortem DNA damage in samples that were excavated long ago and/or have been subject to chemical treatment, such as museum specimen and medical samples [6, 69]. Similarly, hosts and their pathogens can show different levels of DNA damage, despite being exposed to identical preservation conditions [70]. As such, while the presence of postmortem DNA damage is indicative of the presence of aDNA, it alone cannot establish data authenticity. 3.1.2 Base Composition Plot

As postmortem DNA fragmentation appears to be mainly driven through depurination [17], the base composition of the genomic positions immediately preceding aDNA fragment starts is nonrandom and enriched in purines. Users can check for the presence of this feature in their data, directly from the fragment misincorporation plot provided by mapDamage2.0 [40], where the top four plots provide base composition profiles within the ten first and ten

174

Gabriel Renaud et al.

last fragment positions, as well as in their respective flanking 10 bp regions in the reference genome. The exact mechanisms driving DNA fragmentation through depurination postmortem are still unclear. However, tracking base compositional profiles within a sample set of 80 bones spanning the last 60,000 years, Sawyer and colleagues [71] have observed depurination mainly through adenine residues in their most recent samples ( Makefiledice

where [path to dice folder]/alleleFreqNuc/ contains the allele frequency files and [population code of anchor] is the file prefix for the population used as anchor in the allele frequency folder. For instance, to use the West African Yoruba as the anchor population, one can use --anch YRI. The regions to include are specified by --reg, and we recommend to use, for example, [path to dice folder]/mapability/all.1kregions.gz as these are regions with a high mappability score (see http://lh3lh3.users. sourceforge.net/snpable.shtml). The --alfr option specifies the allele frequency folder. A readily available set of allele frequencies for various populations can be downloaded using the Makefile in DICE’s main directory ([path to dice folder]/alleleFreqNuc/). The [human genome reference] should be the same as that used for mapping. Currently, the dice2Makefile.pl script accepts the following options that can modify the computations:

Authentication of Ancient DNA Data

185

l

--cont: This option allows the user to specify the populations to be used as putative sources of present-day human contaminant. By default, the program takes all the populations available in a directory. If computational resources are limited, restricting the number of putative contaminants might be an adequate option.

l

--tau: This refers to the two drift parameters—tauA and tauC— separating the ancestral population from the population to which the ancient genome belongs and the present-day population. The drift parameter is equal to the time in generations divided by the effective population size in each population. By default, DICE explores a parameter space between a minimum of 0 and a maximum 1. It is possible that the MCMC chains have reached the upper bound in the parameter space, when one of the daughter populations has a very high drift time separating it from the ancestral population. This can be seen if the estimates for either one of the tau parameters reach 0.99 and remain blocked in this state. If that is the case, the value of the tau parameters can be increased above 1 using this option. The commands can be now launched using the following:

% make -f Makefiledice -j [number of cores]

where the [number of cores] is the number of threads to run simultaneously on a multicore machine. This process will create, for each BAM file, an output file with the following suffix _Cont_ [CNT]_Anch_[ACH].dice.out.gz containing the posterior samples from the MCMC chain. The [CNT] and [ACH] are the population codes for the present-day human contaminant and for the anchor. Another file is produced with the suffix .dice.txt, which contains posterior modes for each contaminant population and reports the contaminant population showing the highest posterior probability. Another program can be used to summarize this information as bar plots with whiskers to represent the 95% posterior confidence intervals: % logs2plot.R [prefix]_Cont_*_Anch_[ACH].dice.out.gz

where [ACH] is the population code for the anchor population and [prefix] is the file prefix for all output files for a given BAM file. This commands creates files [prefix]_c.pdf for the estimates of the contaminant sorted with respect to the posterior probability. Comparisons between the different contaminant populations should be done when the same anchor population is used hence the use of a fixed [ACH] parameter. This command is part of the Makefile and is run automatically.

186

5.3

6

Gabriel Renaud et al.

Limitations

The probabilistic model implemented as part of DICE will produce posterior samples of the drift, error, and contamination parameters under a given set of anchor and contaminant populations. The user can use different contaminant panels and identify which panel yields the highest posterior probability. Though the panel with the highest posterior probability is not statistically guaranteed to be the most likely contaminant, simulations show that for a range of demographic scenarios tests, this is the case [79]. DICE can return reliable posterior samples with an ancient genome of at least threefold coverage, as long as the drift times separating the endogenous and present-day human contaminant are sufficiently large, for example, as in Neanderthals. However, DICE is not likely to produce reliable posterior samples if the ancient genome and the contaminant are closely related.

Conclusions The goal of this subsection is to give end users tools to ascertain the authenticity of sequence data recovered from hominin samples where downstream analyses can be affected by present-day human contamination. Other software exists to explore other aspects of contamination, such as contamination introduced during processing. For example, metaBIT [82] can be used to validate the quality of multiple libraries constructed from the same extract by comparing their microbial diversities. Ideally, different libraries are made from the same extract, which should have similar microbial diversity. However, contamination events during laboratory work could shift the microbial diversity toward the contaminating source, which can be detected by metaBIT. As the field aDNA expands, the need for cutting edge data authentication methods also increases. It is important to keep in mind that such contamination estimates are not a replacement for negative controls during data production. Additionally, the techniques described above can yield false detection of contaminants. The use of certain molecular tools, such as USER, during sample preparation, or young samples with good molecular preservation, may lead to little or no nucleotide misincorporation [71]. Such factors can lead to false positives whereby the material is endogenous but flagged as contaminant.

7

Notes

7.1 Mapping and mapDamage

1. Due to the circularity of the mitochondrial genome, fragments that span the junction in the reference in fasta format might not be aligned properly. This can lead to a spurious drop in coverage and a lack of resolution of those regions. A possible way to

Authentication of Ancient DNA Data

187

mitigate this is to extend the mitochondrial reference by copying the initial 1000 bp bases at the end of the fasta reference and mapping the aDNA fragments against this new reference. A tool like bam-rewrap (https://bitbucket.org/ustenzel/bio hazard) can allow users to specify the initial length of the mitochondria and copy the alignments exceeding the original length back to their original location at the beginning of the reference. 2. It is important that the correct adapter sequences be specified when running the PALEOMIX pipeline or when trimming sequences using another program. PALEOMIX will, by default, trim standard Illumina paired-end adapter sequences. Please refer to the PALEOMIX and AdapterRemoval documentation for how correctly specify the adapters used during sequencing. 3. It is possible that only a small number of aDNA fragments align to the endogenous reference. This may be because of the following: (a) (if mapped reads are spread randomly across the genome) the library has low endogenous content or very low complexity. (b) (If mapping coverage is unequal across the genome) the reads may be microbial or other fragments aligning randomly to the reference. Especially for short microbial reads, the probability of aligning to a reference by chance is higher than for longer fragments [83]. (c) (if mapping coverage is higher in exonic regions) the reference might be too evolutionarily distant. 4. If the fragment length distribution drops precipitously around the read length and single-end reads were used, this is expected as it is impossible to measure longer fragment lengths without the use of paired-end reads. 5. If a large peak is observed at a particular length in the fragment length distribution, it is recommended to look at the fragments of that very specific length. It is possible that such sequences are merely chimeric adapters, excepting the case where this corresponds to the raw read length. 6. If no deamination is seen and no library preparation protocol can explain this lack of deamination, it is possible that the aligned fragments are present-day human contamination. When working with hominins, present-day human contamination can align at the same rate as the endogenous material. In that case, one can determine the mitochondrial haplogroup from the mapped reads and explore whether this makes sense given the origin of the sample. However, assumptions about the demography of the endogenous sample can often be misleading.

188

Gabriel Renaud et al.

7. As of version 2.0, mapDamage will generate a table of per-position mismatches for each sequence in the reference genome. For poorly assembled genomes, this may result in exceedingly large tables and even program failure, and using the --merge-reference-sequences option is recommended. This will record all per-position mismatches in a single table. 7.2 Estimating Contamination

8. As mentioned above, the lack of postmortem damage can indicate the absence of endogenous aDNA. If the USER treatment was performed, it is possible to have very little residual deamination. If that is the case, it will be difficult for endoCaller to have sufficient statistical power to predict the endogenous base versus the contaminant one. If the endogenous mitochondrial genome is not predicted accurately, estimates of presentday human contamination will be unreliable. It is therefore recommended to try to (a) predict a consensus using endoCaller, (b) ascertain the phylogenetic placement of that consensus using a tool such as HaploGrep2.0 [76], and (c) look for private mutations (i.e., unique to that sample or defining mutations for the alleged subhaplogroup). 9. It is possible that the iterative function of schmutzi does not converge, probably because the algorithm has identified two almost equally likely models. It is recommended to run endoCaller with a low estimate of present-day human contamination and again with a higher estimate. Both predictions should be compared for divergent sites, and phylogenetic information should be used to determine which consensus is the most likely. 10. When using endoCaller, the program can predict the contaminant (using option -single), and the iterative procedure can use this prediction as a record in the database of putative contaminants (usually specified via [path to schmutzi]/eurasian/ freqs/). This is normally done by default by schmutzi.pl. The final present-day human contamination obtained using default parameters might differ from the one obtained without the prediction of the contaminant (see option --notusepredC). If the estimate obtained using the option --notusepredC is higher when enabling contaminant prediction, the first estimate is more likely correct as using the --notusepredC option tends to underestimate present-day human contamination. If the prediction of contaminant is enabled and produces a high estimate (above ~20%), there are two options. If the estimate obtained using the option --notusepredC is low (below ~5%), the first estimate, without the use of this option, is likely an overestimate and probably wrong. If the estimate obtained using the option --notusepredC is high (above ~10%), then it may be an underestimate, and the estimate obtained without using the option --notusepredC is likely correct.

Authentication of Ancient DNA Data

189

11. The methods to quantify present-day human contamination using the X chromosome implemented in ANGSD can work at a depth of coverage as low as 0.5 on the chromosome X if contamination is lower than 10%. Please note that this method is aimed at male individuals only. This method also underestimates contamination rates above 15% unless the depth for chromosome X is greater than 1. 12. The contamination estimate implemented in ANGSD is underestimated when the assumption about the “contaminant population” is wrong. For instance, the empirical contaminant is of European ancestry, and the method is run using African frequencies. Therefore, it is recommended to use different panels which can provide some insight as to the about the ancestry of the contaminant. 13. When using DICE, it is possible that there are major discrepancies between the estimates of contamination when using different populations as the putative contaminant source. Ideally, when ordering the contamination rates with respect the posterior mode, the populations that are closest to the true contaminant should have higher posterior modes. If, for instance, the contaminant individual had Han ancestry, then CHB should be the contaminant with the highest posterior mode, followed by CHS, JPT, followed by other Asians, Europeans, and finally Africans. It is likely that if there are major discrepancies for the present-day human contamination estimate (in the order of 15–20%) between various populations, the algorithm has not converged to the true posterior distribution and the estimates may be unreliable. This can be the case with very low-coverage samples. 14. When running DICE, the user has to pick a population as anchor. Please note that different putative contaminant populations should be compared when using the same anchor population. The anchor population is used to compute the drift parameters linking it to the ancient sample. Ideally, a population that has not received gene flow from the ancient sample should be used. For instance, the Yoruba (YRI) anchor can be used if the ancient sample is a Neanderthal with potential European contamination. It is also possible that the default boundaries for the drift parameters might not be suitable for a particular ancient sample when using DICE (please see comment about the tau parameter in the dice2make.pl section). This will be seen when the MCMC chain reaches the upper bound for one of the tau parameters, which is set by default at 1.0. This problem can be circumvented by increasing the range of the parameter space for tau via the --tau option in dice2make.pl.

190

Gabriel Renaud et al.

15. There is also the possibility of admixture from the archaic population, to which the ancient sample belongs, into the anchor population, which might result in incorrect estimates of present-day human contamination. To account for this, DICE also offers the possibility of estimating admixture from the ancient samples via parameters -3p, -aR, and -aT of the main executable program. Please refer to the software manual for further information regarding these parameters.

Acknowledgments We would like to thank Fernando Racimo for comments and suggestions and Jose´ Victor Moreno Mayar and Thorfinn Sand Korneliussen for their insights into the contamination method using the X chromosome. This work was supported by the Danish Council for Independent Research, Natural Sciences (Grant 400200152B); the Danish National Research Foundation (Grant DNRF94); Initiative d’Excellence Chaires d’attractivite´, Universite´ de Toulouse (OURASI); the Villum Fonden miGENEPI research project; and the European Research Council (ERC-CoG-2015681605). References 1. Ermini L, Der Sarkissian C, Willerslev E, Orlando L (2015) Major transitions in human evolution revisited: a tribute to ancient DNA. J Hum Evol 79:4–20. https://doi.org/10. 1016/j.jhevol.2014.06.015 2. Llamas B, Fehren-Schmitz L, Valverde G et al (2016) Ancient mitochondrial DNA provides high-resolution time scale of the peopling of the. Am Sci Adv. https://doi.org/10.1126/ sciadv.1501385 3. Librado P, Der Sarkissian C, Ermini L et al (2015) Tracking the origins of Yakutian horses and the genetic basis for their fast adaptation to subarctic environments. Proc Natl Acad Sci U S A 112:E6889–E6897. https://doi.org/10. 1073/pnas.1513696112 4. Frantz LAF, Mullin VE, Pionnier-Capitan M et al (2016) Genomic and archaeological evidence suggest a dual origin of domestic dogs. Science 352:1228–1231. https://doi.org/10. 1126/science.aaf3161 5. MacHugh DE, Larson G, Orlando L (2016) Taming the past: ancient DNA and the study of animal domestication. Annu Rev Anim Biosci. https://doi.org/10.1146/annurev-animal022516-022747 6. Der Sarkissian C, Ermini L, Schubert M et al (2015) Evolutionary genomics and

conservation of the endangered Przewalski’s horse. Curr Biol 25:2577–2583. https://doi. org/10.1016/j.cub.2015.08.032 7. Da Fonseca RR, Smith BD, Wales N, et al (2015) The origin and evolution of maize in the Southwestern United States. Nat Plants. doi: https://doi.org/10.1038/nplants.2014. 3 8. Bos KI, Schuenemann VJ, Golding GB et al (2011) A draft genome of Yersinia pestis from victims of the Black Death. Nature. https:// doi.org/10.1038/nature10675 9. Wagner MR, Lundberg DS, Coleman-Derr D et al (2015) Corrigendum to Wagneret al.: natural soil microbes alter flowering phenology and the intensity of selection on flowering time in a wild Arabidopsis relative. Ecol Lett. https://doi.org/10.1111/ele.12400 10. Ramos-Madrigal J, Smith BD, Moreno-Mayar JV et al (2016) Genome sequence of a 5,310year-old maize cob provides insights into the early stages of maize domestication. Curr Biol 26:3195–3201. https://doi.org/10.1016/j. cub.2016.09.036 11. Rasmussen S, Allentoft ME, Nielsen K et al (2015) Early divergent strains of Yersinia pestis in Eurasia 5,000 years ago. Cell 163:571–582. https://doi.org/10.1016/j.cell.2015.10.009

Authentication of Ancient DNA Data 12. Orlando L, Ginolhac A, Zhang G et al (2013) Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse. Nature 499:74–78. https://doi. org/10.1038/nature12323 13. Dabney J, Knapp M, Glocke I et al (2013) Complete mitochondrial genome sequence of a Middle Pleistocene cave bear reconstructed from ultrashort DNA fragments. Proc Natl Acad Sci U S A 110:15758–15763. https:// doi.org/10.1073/pnas.1314445110 14. Meyer M, Arsuaga J-L, de Filippo C et al (2016) Nuclear DNA sequences from the Middle Pleistocene Sima de los Huesos hominins. Nature 531:504–507. https://doi.org/10. 1038/nature17405 15. Meyer M, Fu Q, Aximu-Petri A et al (2014) A mitochondrial genome sequence of a hominin from Sima de los Huesos. Nature 505:403–406. https://doi.org/10.1038/ nature12788 16. Hofreiter M, Serre D, Poinar HN et al (2001) Ancient DNA. Nat Rev Genet 2:353–359. https://doi.org/10.1038/35072071 17. Briggs AW, Stenzel U, Johnson PL et al (2007) Patterns of damage in genomic DNA sequences from a Neandertal. Proc Natl Acad Sci U S A 104:14616–14621. https://doi. org/10.1073/pnas.0704665104 18. Green RE, Malaspinas A-S, Krause J et al (2008) A complete Neandertal mitochondrial genome sequence determined by highthroughput sequencing. Cell 134:416–426. https://doi.org/10.1016/j.cell.2008.06.021 19. Gilbert MTP, Wilson AS, Bunce M, Hansen AJ, Willerslev E, Shapiro B, Higham TFG, Richards MP, O’Connell TC, Tobin DJ, Janaway RC, Cooper A (2004) Ancient mitochondrial DNA from hair. Curr Biol 14:R463–R464 20. Pilli E, Modi A, Serpico C et al (2013) Monitoring DNA contamination in handled vs. directly excavated ancient human skeletal remains. PLoS One. https://doi.org/ 10.1371/journal.pone.0052524 21. Korlevic´ P, Gerber T, Gansauge M-T et al (2015) Reducing microbial and human contamination in DNA extractions from ancient bones and teeth. BioTechniques 59:87–93. https://doi.org/10.2144/000114320 22. Guschanski K, Krause J, Sawyer S et al (2013) Next-generation museomics disentangles one of the largest primate radiations. Syst Biol 62:539–554. https://doi.org/10.1093/sys bio/syt018 23. Pruvost M, Schwarz R, Correia VB et al (2007) Freshly excavated fossil bones are best for amplification of ancient DNA. Proc Natl Acad

191

Sci U S A 104:739–744. https://doi.org/10. 1073/pnas.0610257104 24. Champlot S, Berthelot C, Pruvost M et al (2010) An efficient multistrategy DNA decontamination procedure of PCR reagents for hypersensitive PCR applications. PLoS One. https://doi.org/10.1371/journal.pone. 0013042 25. Serre D, Langaney A, Chech M, et al (2004) No evidence of Neandertal mtDNA contribution to early modern humans. PLoS Biol. https://doi.org/10.1371/journal.pbio. 0020057 26. Brown S, Higham T, Slon V et al (2016) Identification of a new hominin bone from Denisova Cave, Siberia using collagen fingerprinting and mitochondrial DNA analysis. Sci Rep. https://doi.org/10.1038/srep23559 27. Briggs AW, Good JM, Green RE et al (2009) Targeted retrieval and analysis of five Neandertal mtDNA genomes. Science 325:318–321. https://doi.org/10.1126/science.1174462 28. Sawyer S, Renaud G, Viola B et al (2015) Nuclear and mitochondrial DNA sequences from two Denisovan individuals. Proc Natl Acad Sci U S A 112:15696–15700. https:// doi.org/10.1073/pnas.1519905112 29. Lazaridis I, Patterson N, Mittnik A et al (2014) Ancient human genomes suggest three ancestral populations for present-day Europeans. Nature 513:409–413. https://doi.org/10. 1038/nature13673 30. Fu Q, Li H, Moorjani P et al (2014) Genome sequence of a 45,000-year-old modern human from western Siberia. Nature 514:445–449. https://doi.org/10.1038/nature13810 31. Allentoft ME, Sikora M, Sjo¨gren K-G et al (2015) Population genomics of Bronze Age Eurasia. Nature 522:167–172. https://doi. org/10.1038/nature14507 32. Haak W, Lazaridis I, Patterson N et al (2015) Massive migration from the steppe was a source for Indo-European languages in Europe. Nature 522:207–211. https://doi.org/10. 1038/nature14317 33. Krause J, Briggs AW, Kircher M et al (2010) A complete mtDNA genome of an early modern human from Kostenki, Russia. Curr Biol 20:231–236. https://doi.org/10.1016/j.cub. 2009.11.068 34. Lindahl T (1993) Instability and decay of the primary structure of DNA. Nature 362:709–715. https://doi.org/10.1038/ 362709a0 35. Seguin-Orlando A, Schubert M, Clary J et al (2013) Ligation bias in illumina nextgeneration DNA libraries: implications for

192

Gabriel Renaud et al.

sequencing ancient genomes. PLoS One. https://doi.org/10.1371/journal.pone. 0078575 36. Meyer M, Kircher M, Gansauge M-T et al (2012) A high-coverage genome sequence from an archaic Denisovan individual. Science 338:222–226. https://doi.org/10.1126/sci ence.1224344 37. Wales N, Ramos Madrigal J, Cappellini E et al (2016) The limits and potential of paleogenomic techniques for reconstructing grapevine domestication. J Archaeol Sci. https://doi. org/10.1016/j.jas.2016.05.014 38. Seguin-Orlando A, Hoover CA, Vasiliev SK et al (2015) Amplification of TruSeq ancient DNA libraries with AccuPrime Pfx: consequences on nucleotide misincorporation and methylation patterns. Sci Technol Archaeol Res. https://doi.org/10.1179/ 2054892315Y.0000000005 39. Ginolhac A, Rasmussen M, Gilbert MTP et al (2011) mapDamage: testing for damage patterns in ancient DNA sequences. Bioinformatics 27:2153–2155. https://doi.org/10.1093/ bioinformatics/btr347 40. Jo´nsson H, Ginolhac A, Schubert M et al (2013) mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters. Bioinformatics 29:1682–1684. https://doi.org/10.1093/bioinformatics/ btt193 41. Wall JD, Kim SK (2007) Inconsistencies in Neanderthal genomic DNA sequences. PLoS Genet 3:1862–1866. https://doi.org/10. 1371/journal.pgen.0030175 42. Pru¨fer K, Meyer M (2015) Anthropology. Comment on “Late Pleistocene human skeleton and mtDNA link Paleoamericans and modern Native Americans”. Science. https://doi. org/10.1126/science.1260617 43. Weiß CL, Dannemann M, Pru¨fer K, Burbano HA (2015) Contesting the presence of wheat in the British Isles 8,000 years ago by assessing ancient DNA authenticity from low-coverage data. eLife. https://doi.org/10.7554/eLife. 10005 44. Schubert M, Ermini L, Der Sarkissian C et al (2014) Characterization of ancient and modern genomes by SNP detection and phylogenomic and metagenomic analysis using PALEOMIX. Nat Protoc 9:1056–1082. https://doi.org/10.1038/nprot.2014.063 45. Kircher M (2012) Analysis of high-throughput ancient DNA sequencing data. Methods Mol Biol 840:197–228. https://doi.org/10.1007/ 978-1-61779-516-9_23

46. Schubert M, Lindgreen S, Orlando L (2016) AdapterRemoval v2: rapid adapter trimming, identification, and read merging. BMC Res Notes. https://doi.org/10.1186/s13104016-1900-2 47. Renaud G, Stenzel U, Kelso J (2014) leeHom: adaptor trimming and merging for Illumina sequencing reads. Nucleic Acids Res. https:// doi.org/10.1093/nar/gku699 48. Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120. https://doi.org/10.1093/bioinformatics/ btu170 49. O’Connell J, Schulz-Trieglaff O, Carlson E et al (2015) NxTrim: optimized trimming of Illumina mate pair reads. Bioinformatics 31:2035–2037. https://doi.org/10.1093/bio informatics/btv057 50. Sturm M, Schroeder C, Bauer P (2016) SeqPurge: highly-sensitive adapter trimming for paired-end NGS data. BMC Bioinformatics. https://doi.org/10.1186/s12859-016-10697 51. Zhang J, Kobert K, Flouri T, Stamatakis A (2014) PEAR: a fast and accurate Illumina Paired-End reAd mergeR. Bioinformatics 30:614–620. https://doi.org/10.1093/bioin formatics/btt593 52. Magocˇ T, Salzberg SL (2011) FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics 27:2957–2963. https://doi.org/10.1093/bio informatics/btr507 53. Mielczarek M, Szyda J (2016) Review of alignment and SNP calling algorithms for nextgeneration sequencing data. J Appl Genet 57:71–79. https://doi.org/10.1007/s13353015-0292-7 54. Li HH, Durbin RR (2009) Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25:1754–1760. https://doi.org/10.1093/bioinformatics/ btp324 55. Kerpedjiev P, Frellsen J, Lindgreen S, Krogh A (2014) Adaptable probabilistic mapping of short reads using position specific scoring matrices. BMC Bioinformatics. https://doi. org/10.1186/1471-2105-15-100 56. Langmead B, Salzberg SL (2012) Fast gappedread alignment with Bowtie 2. Nat Methods 9:357–359. https://doi.org/10.1038/nmeth. 1923 57. Nomiyama H, Fukuda M, Wakasugi S et al (1985) Molecular structures of mitochondrialDNA-like sequences in human nuclear DNA.

Authentication of Ancient DNA Data Nucleic Acids Res 13:1649–1658. https://doi. org/10.1093/nar/13.5.1649 58. Lopez JV, Yuhki N, Masuda R et al (1994) Numt, a recent transfer and tandem amplification of mitochondrial DNA to the nuclear genome of the domestic cat. J Mol Evol 39:174–190 59. Li H, Handsaker B, Wysoker A et al (2008) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079 60. Dozmorov MG, Adrianto I, Giles CB et al (2015) Detrimental effects of duplicate reads and low complexity regions on RNA- and ChIP-seq data. BMC Bioinformatics. https:// doi.org/10.1186/1471-2105-16-S13-S10 61. McKenna A, Hanna M, Banks E et al (2010) The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20:1297–1303. https://doi.org/10.1101/gr. 107524.110 62. Briggs AW, Stenzel U, Meyer M et al (2010) Removal of deaminated cytosines and detection of in vivo methylation in ancient DNA. Nucleic Acids Res. https://doi.org/10.1093/ nar/gkp1163 63. Meyer M, Kircher M (2010) Illumina sequencing library preparation for highly multiplexed target capture and sequencing. Cold Spring Harb Protoc. https://doi.org/10.1101/pdb. prot5448 64. Krause J, Unger T, Noc¸on A et al (2008) Mitochondrial genomes reveal an explosive radiation of extinct and extant bears near the Miocene-Pliocene boundary. BMC Evol Biol. https://doi.org/10.1186/1471-2148-8-220 65. Rohland N, Harney E, Mallick S et al (2015) Partial uracil-DNA-glycosylase treatment for screening of ancient DNA. Philos Trans R Soc Lond B Biol Sci. https://doi.org/10.1098/ rstb.2013.0624 66. Pedersen JS, Valen E, Velazquez AMV et al (2014) Genome-wide nucleosome map and cytosine methylation levels of an ancient human genome. Genome Res 24:454–466. https://doi.org/10.1101/gr.163592.113 67. Gokhman D, Lavi E, Pru¨fer K et al (2014) Reconstructing the DNA methylation maps of the Neandertal and the Denisovan. Science 344:523–527. https://doi.org/10.1126/sci ence.1250368 68. Hanghøj K, Seguin-Orlando A, Schubert M et al (2016) Fast, accurate and automatic ancient nucleosome and methylation maps with epiPALEOMIX. Mol Biol Evol 33:3284–3298. https://doi.org/10.1093/ molbev/msw184

193

69. Renaud G, Slon V, Duggan AT, Kelso J (2015) Schmutzi: estimation of contamination and endogenous mitochondrial consensus calling for ancient DNA. Genome Biol. https://doi. org/10.1186/s13059-015-0776-0 70. Schuenemann VJ, Singh P, Mendum TA et al (2013) Genome-wide comparison of medieval and modern Mycobacterium leprae. Science 341:179–183. https://doi.org/10.1126/sci ence.1238286 71. Sawyer S, Krause J, Guschanski K et al (2012) Temporal patterns of nucleotide misincorporations and DNA fragmentation in ancient DNA. PLoS One. https://doi.org/10.1371/journal. pone.0034131 72. Parks M, Lambert D (2015) Impacts of low coverage depths and post-mortem DNA damage on variant calling: a simulation study. BMC Genomics. https://doi.org/10.1186/s12864015-1219-8 73. Skoglund P, Northoff BH, Shunkov MV et al (2014) Separating endogenous ancient DNA from modern day contamination in a Siberian Neandertal. Proc Natl Acad Sci U S A 111:2229–2234. https://doi.org/10.1073/ pnas.1318934111 74. Green RE, Briggs AW, Krause J et al (2009) The Neandertal genome and ancient DNA authenticity. EMBO J 28:2494–2502. https://doi.org/10.1038/emboj.2009.222 75. Zhang H, Paijmans JLA, Chang F et al (2013) Morphological and genetic evidence for early Holocene cattle management in northeastern China. Nat Commun. https://doi.org/10. 1038/ncomms3755 76. Weissensteiner H, Pacher D, Kloss-Brandst€atter A et al (2016) HaploGrep 2: mitochondrial haplogroup classification in the era of highthroughput sequencing. Nucleic Acids Res 44:W58–W63. https://doi.org/10.1093/ nar/gkw233 77. Rasmussen M, Sikora M, Albrechtsen A et al (2015) The ancestry and affiliations of Kennewick Man. Nature 523:455–458. https://doi. org/10.1038/nature14625 http://www. nature.com/nature/journal/vnfv/ncurrent/ abs/nature14625.html#supplementaryinformation 78. Korneliussen TS, Albrechtsen A, Nielsen R (2014) ANGSD: analysis of next generation sequencing data. BMC Bioinformatics. https://doi.org/10.1186/s12859-014-03564 79. Racimo F, Renaud G, Slatkin M (2016) Joint estimation of contamination, error and demography for nuclear DNA from ancient humans.

194

Gabriel Renaud et al.

PLoS Genet. https://doi.org/10.1371/jour nal.pgen.1005972 80. Skoglund P, Stora˚ J, Go¨therstro¨m A, Jakobsson M (2013) Accurate sex identification of ancient human remains using DNA shotgun sequencing. J Archaeol Sci. https://doi.org/10.1016/ j.jas.2013.07.004 81. Abecasis GR, Auton A, Brooks LD, et al with 1000 Genomes Project Consortium (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491:56–65. https:// doi.org/10.1038/nature11632

82. Louvel G, Der Sarkissian C, Hanghøj K, Orlando L (2016) metaBIT, an integrative and automated metagenomic pipeline for analysing microbial profiles from high-throughput sequencing shotgun data. Mol Ecol Resour 16:1415–1427. https://doi.org/10.1111/ 1755-0998.12546 83. Renaud G, Hanghøj K, Willeslev E, Orlando L (2016) gargammel: a sequence simulator for ancient DNA. Bioinformatics 33(4):577–579. https://doi.org/10.1093/bioinformatics/ btw670

Chapter 18 Assembly of Ancient Mitochondrial Genomes Without a Closely Related Reference Sequence Christoph Hahn Abstract Recent methodological advances have transformed the field of ancient DNA (aDNA). Basic bioinformatics skills are becoming essential requirements to process and analyze the sheer amounts of data generated by current aDNA studies and in biomedical research in general. This chapter is intended as a practical guide to the assembly of ancient mitochondrial genomes, directly from genomic DNA-derived next-generation sequencing (NGS) data, specifically in the absence of closely related reference genomes. In a hands-on tutorial suitable for readers with little to no prior bioinformatics experience, we reconstruct the mitochondrial genome of a woolly mammoth deposited ~45,000 years ago. We introduce key software tools and outline general strategies for mitogenome assembly, including the critical quality assessment of assembly results without a reference genome. Key words Ancient DNA, Genome assembly, Bioinformatics, Mitogenomics, Next-generation sequencing

1

Introduction The emergence of next-generation sequencing (NGS) technologies has revolutionized evolutionary biology. NGS approaches as implemented by Illumina, Pacific Biosciences (PB), and Oxford Nanopore (ONP) sequencing platforms generate massive amounts of sequence data at low cost, democratizing genomics research. The field of ancient DNA has also been transformed by the recent methodological advances, with large-scale genomic data yielding new insights into the evolutionary histories of extinct species and enigmatic ancestors of the extant organismal diversity [1–5]. Ancient mitogenomics in particular has become a pillar of modern ancient DNA research, thanks to the higher representation of mitochondrial DNA compared to nuclear DNA in genomic data sets, the small size of most mitochondrial genomes, and the comparatively simple structure and inheritance pathway of mitochondrial genomes [6–12].

Beth Shapiro et al. (eds.), Ancient DNA: Methods and Protocols, Methods in Molecular Biology, vol. 1963, https://doi.org/10.1007/978-1-4939-9176-1_18, © Springer Science+Business Media, LLC, part of Springer Nature 2019

195

196

Christoph Hahn

The most widely used NGS platform in the aDNA field is Illumina, which is characterized by generating short (500 GB) and specialized computing infrastructure.

2.3

Software

Generally, most bioinformatics software runs on UNIX-based systems such as Linux or Mac OS. The below tutorial was developed and tested on Ubuntu 16.04 (64-bit), and all software used is freely available. A detailed list of software tools and versions (Table 1) used in the tutorial as well as instructions for installation can be found below. All required software (except the graphical tool Tablet) has also been bundled in a dedicated Docker container (https://hub.docker.com/r/chrishah/amito_mimb/). Docker (https://www.docker.com) provides installation packages that facilitate integration with Windows and Mac operating systems and should thus enable operating system independent usage of the workflow demonstrated. The software used in this tutorial is described in Table 1.

198

Christoph Hahn

Table 1 Detailed list of software used in the tutorial (all last accessed 16 Jan 2019) Software package

Version

Available from URL

BLAST+

2.2.31+

ftp://ftp.ncbi.nlm.nih.gov/blast/executables/LATEST/

MITObim

1.9.1

https://github.com/chrishah/MITObim

MIRA

4.0.2

https://sourceforge.net/projects/mira-assembler/

MUSCLE

3.8.31

http://drive5.com/muscle/

Perl

5.22.1

http://www.perl.org/

Python

2.7.12

https://www.python.org/

Tablet

1.16.09.06

https://ics.hutton.ac.uk/tablet/

3

Methods The following tutorial details strategies for assembling the mitochondrial genome of a woolly mammoth (Mammuthus primigenius) directly from genomic sequence read data using the MITObim pipeline [16] (see Note 1). The MITObim pipeline is designed to reconstruct complete mitochondrial genomes from genomic shortread sequence data without the need for a close reference genome. The pipeline initially identifies genuine mitochondrial reads based on sequence similarity to mitochondrial bait sequences (full mitochondrial genomes or short fragments) from potentially distantly related species [16]. These reads are then assembled using the MIRA assembler [17]. The process is repeated iteratively using the assembly result from the previous iteration as bait sequence to identify more mitochondrial reads. Given the circular nature of the mitochondrial genome, the number of reads baited is expected to plateau once the molecule is fully reconstructed. In the tutorial, I use mitochondrial sequence data from the West Indian manatee (Trichechus manatus) as bait. The most recent common ancestor (MRCA) of M. primigenius and T. manatus is thought to have lived in the late Cretaceous, some 75 million years ago [19]. Theoretically, one single short region of length k basepairs that is identical between the bait sequence and the mitochondrial genome in question is sufficient to seed a robust and successful mitochondrial reconstruction (Fig. 1). The tutorial is mainly command line based. Basic prior experience with the Unix shell is advantageous but not necessary for successful completion of the tutorial. Below, I provide code cells detailing commands to be executed in the Unix shell. “$” symbols at the beginning of lines symbolizes the prompt of the Unix shell and are not part of the command. “\” symbols are often inserted at

Assembly of Ancient Mitochondrial Genomes

199

80

opossum l chimp l tenrec blue whale l l

40

aardvark manatee l l

20

MITObim iterations

60

tamandua l

0

Asian elephant ll African elephant

0

20

40

60

80

100

120

140

tmrca [My] mitochondrial bait

Fig. 1 Number of MITObim iterations required for reconstructing the woolly mammoth mitochondrial genome (average across ten randomly drawn subsets of 20% of the read data, ENA accession: ERR852928) dependent on the divergence from the mitochondrial reference. tmrca time most recent common ancestor (estimates from ref. 19)

the end of command lines to increase readability and indicate the continuation of the same command in the next line. You may write out lines linked via “\” in single one-line command. Lines starting with “#” indicate comments that are intended to help the user understand the following code. The tutorial makes extensive use of variables; in the Unix shell, these are first defined using the syntax “variable¼sometext” or “variable¼10” and subsequently called via “$variable”. While this may at first appear complicated to users with little prior experience in bioinformatics, the intention is to make the strategy described in this chapter easily applicable to any other data set. To apply the same workflow to other data, simply adjust the variable declarations at the start of the session (e.g., filenames and file locations, etc.), and, in principle, no further changes should be needed to the actual commands. However, successful reconstruction of mitochondrial genomes from short-read data will depend on the data (depth of coverage, quality) and the characteristics of the particular genome, such as GC content and the presence of repeats.

200

Christoph Hahn

3.1 Preparing Example Data/ Software

Begin by downloading a dedicated Github repository, which contains the reference sequences to be used in the tutorial (see https:// github.com/chrishah/aMITO_MiMB; permanently archived at DOI:10.5281/zenodo.1407986), and the Illumina read data for the Oymyakon mammoth. These data are available from European Nucleotide Archive (http://www.ebi.ac.uk/ena) under the accession ERR852028. The download time will be around 90 min, depending on your download speed. # download the dedicated repository, decompress # and change to a new working directory $ wget https://github.com/chrishah/aMITO_MiMB/archive/v1.0.zip $ unzip v1.0.zip $ cd aMITO_MiMB-1.0/tutorial/ # create the directory that will contain the read data $ mkdir raw_reads/ # Download the Illumina read data from ENA $ acc=ERR852028 $ wget -O raw_reads/$acc.fastq.gz \ ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR852/$acc/$acc.fastq.gz

MITObim makes use of the MIRA assembler [17], as well as additional programs from the MIRA assembly suite. MITObim also ships with precompiled binaries for MIRA, if not already installed on your system. Additionally, for evaluation of the mitochondrial genome reconstructions, I recommend the BLAST+ command line applications [20]. In many Linux distributions (e.g., Ubuntu), BLAST+ is also readily available from public repositories. A range of solutions exists for multiple sequence alignment. In the below tutorial, we will be using MUSCLE [21]. On Ubuntu, MUSCLE can also be obtained from the public repository. An example for a visual tool for the exploration of NGS data is Tablet [22]. In the remainder of this section, I will detail the minimum software installation steps, assuming a recent Ubuntu distribution. Alternatively, if you decide to use the Docker container mentioned above, the following steps can be skipped, since it ships as a self-contained environment with all software (except the graphical tool Tablet) preinstalled. Instructions how to launch the Docker container are given in the dedicated Github repository (https://github.com/chrishah/aMITO_MiMB). # download the MITObim repository $ wget https://github.com/chrishah/MITObim/archive/v1.9.1.zip $ unzip v1.9.1.zip

Assembly of Ancient Mitochondrial Genomes

201

# test MITObim and display usage information $ ./MITObim-1.9.1/MITObim.pl -h # if MIRA is not installed globally you can use binaries # that come with the MITObim repository $ ./MITObim-1.9.1/docker/external_software/mira_4.0.2/mira -v

The executables for MITObim and MIRA are now stored locally in a particular location of your machine. In order to allow your commands to call these globally, i.e., from anywhere on your system without having to specify these locations, you can add them to the environmental variable “$PATH” for the current session. # add MITObim to the PATH $ cd MITObim-1.9.1 $ PATH=$PATH:$(pwd) # add MIRA $ PATH=$PATH:$(pwd)/docker/external_software/mira_4.0.2 # MITObim comes with scripts that we will also add $ PATH=$PATH:$(pwd)/misc_scripts $ export PATH # download and install blast+ on Ubuntu (might require ‘sudo’) $ apt-get install ncbi-blast+ # test blastn and display usage information $ blastn -h # download and install muscle on Ubuntu (might require ‘sudo’) $ apt-get install muscle # test muscle and display usage information $ muscle -h

3.2 Sequence Read Data Preprocessing and Initial Assembly

Read data used in the current tutorial has already been preprocessed. In brief, sequences have been quality controlled and filtered. Additionally, overlapping paired-end reads have been merged to achieve longer reads where possible. See [4] for details. Among the most important factors for successful genome assembly is the read coverage depth (e.g., 50-fold coverage, often stated as “50”), i.e., the number of sequence reads in which a given nucleotide position in the underlying molecule is represented. The relatively small size of animal mitochondrial genomes (~15 to 16 kbp) in combination with the mitochondrial genome being present in multiple copies per cell often results in high coverage of mitochondrial reads compared to nuclear reads in genomic data sets. Perhaps counterintuitively, excessive coverage does not necessarily yield best assembly results. Even after rigorous quality filtering, NGS read data is by no means free of error, and high coverage

202

Christoph Hahn

worsens the signal to noise ratio, which complicates (and sometimes worsens) assembly and results in unnecessarily long run times and increased computational requirements. Most assembly software, including the MIRA assembler, is designed for coverage depths of ~70–100. For best performance I therefore recommend to subsample read data to these coverage levels in order to improve the quality of your results. Furthermore, as an important strategy for the notoriously difficult assessment of genome assembly quality (especially without a reference genome), I propose repeated assembly of random subsets of data to verify results by comparison, thus leveraging extensive read coverage in mitochondrial sequence data. Begin by performing a single MITObim iteration (“-end 0”), after which we will assess the coverage to decide for a suitable subsample size (runtime ~30 min). # declare some variables, specify file locations, etc. # These will be reused throughout the session # $(pwd) inserts the full path to your present working directory $ sample=WoollyMammoth $ ref_id=Trichechus_manatus.AM904728.full $ readpool=$(pwd)/raw_reads/ERR852028.fastq.gz $ reffasta=$(pwd)/mt_refs/Trichechus_manatus.AM904728.fasta # make and enter a new analysis directory $ mkdir $sample.$ref_id $ cd $sample.$ref_id # run MITObim for a single iteration # Note that all output will be written to the log file $ MITObim.pl -end 0 -sample $sample -ref $ref_id \ --quick $reffasta -readpool $readpool \ --trimreads &> MITObim.log.txt

MITObim will generate a new directory for each iteration. In the case of the above command, a single directory “iteration0/” will have been produced that contains all intermediate files. For details on all the files produced by MIRA, please refer to the MIRA documentation. We have saved a log of the previous run to the text file “MITObim.log.txt”, which can be opened in any text editor. M. primigenius sequence reads will only have been baited and assembled to regions of the T. manatus mitochondrial genome that are sufficiently conserved (per default MIRA allows for 15% mismatch). MITObim will per default maintain the relative position between conserved regions and indicate as of yet unresolved regions in a given iteration by stretches of Ns. Output files will be given a prefix that is composed of the sample and reference name that you specified when running MITObim. For the tutorial we will

Assembly of Ancient Mitochondrial Genomes

203

have the assembly result in fasta format displayed in the Unix shell. The assembly can also be explored at the read level in a suitable viewer, such as Tablet [22]. The assembly result can be converted into a rich data format (ace) that is natively supported by Tablet using the program miraconvert of the MIRA assembly suite. The resulting file “result_iteration0.ace” can be loaded directly into Tablet, which also summarizes basic assembly statistics. A textual summary of the assembly stats was also generated by MIRA and can be displayed in the shell. # set prefix $ prefix=$sample-$ref_id # show assembly result in fasta format $ cat iteration0/$prefix-it0_noIUPAC.fasta # convert assembly to ace format for visualization in Tablet $ it=iteration0 $ result_dir=$it/$prefix\_assembly/$prefix\_d_results $ miraconvert $result_dir/$prefix\_out.maf \ result_iteration0.ace # display textual summary of assembly result $ info_dir=$it/$prefix\_assembly/$prefix\_d_info $ cat $info_dir/$prefix\_info_contigstats.txt

Average read coverage is calculated across the entire sequence, so it is not very informative in this case. Maximum coverage in a few short regions is ~1700, but be aware that the presence of repetitive elements may locally inflate coverage. In the absence of a better estimate for overall average mitochondrial coverage, we therefore choose a cautious subsampling strategy and randomly sample 20% of the data before we proceed. MITObim provides a script that allows to draw random subsamples of the desired size in percent of the total number of reads. For every read it decides to keep/drop it based on a pseudo-random number. Specifying the seed for the pseudo-random number generator allows to exactly replicate a subsample if required. # specify the location and file prefix for the new file # containing the subsampled reads $ outprefix=./raw_reads/ERR852028.sub20.rep01 # randomly sample ~20% of reads. Note the seed $ downsample.py -r $readpool -s 20 \ --seed 47271479 | gzip > $outprefix.fastq.gz

3.3 Assembling the Mitochondrial Genome of the Woolly Mammoth

In the following we will reconstruct the mitochondrial genome of M. primigenius using mt sequence data of T. manatus as bait for the initial assembly. I will demonstrate three different strategies that may be chosen based on the reference resources available or used

204

Christoph Hahn

complementarily as in the current tutorial. Note that each of the following exercises assumes that you start at your base working directory. 3.3.1 Full Mitochondrial Reference: Relative Position Conserved

This strategy will use the full mitochondrial genome of T. manatus as initial bait. After the first iteration, the relative position of conserved regions will be retained, and gaps between them will be closed iteratively. This strategy is recommended if full mitochondrial reference genome resources are available and no large structural variation (large indels, inversions) are expected between the target species and the reference sequence. We specify to perform a maximum of 500 iterations, but MITObim is expected to converge onto a stable read number well before that. Additionally we specify two optional parameters: “--trimreads” will switch on the MIRA internal read trimming routine. Using the option “--clean” will force MITObim to remove results from all but the last two iterations at any given time, which will reduce disk space. Runtime for this analysis is ~6 h. # update variables and input files where necessary # the variable $sample will be used throughout $ sample=WoollyMammoth-sub20-rep01 $ ref_id=Trichechus_manatus.AM904728.mtgenome $ readpool=$(pwd)/raw_reads/ERR852028.sub20.rep01.fastq.gz # make and enter a new analysis directory $ mkdir $sample.$ref_id $ cd $sample.$ref_id $ MITObim.pl -end 500 -sample $sample -ref $ref_id \ --quick $reffasta -readpool $readpool --clean \ --trimreads &> MITObim.log.txt

MITObim should reach a stationary read number after 47 iterations and has produced a single contiguous sequence (contig) of ~17,800 bp length. The average coverage of the final result is ~100. You can explore the result and assembly stats as before (see Subheading 3.2 above). Strategies for quality assessment are demonstrated in Subheading 3.4 below. 3.3.2 Full Mitochondrial Reference: Split Conserved Regions

This strategy will again use the full mitochondrial genome of a related species as starting reference but does not enforce the relative position of the initially identified conserved regions. It splits (“-split”) the result into separate assembly seeds wherever resolved regions are separated by gaps. Each seed is then extended separately. To avoid spurious assemblies, we will filter out results shorter than 200bp (“--min_len 200”) and with average coverage below 50 (“--min_cov 50”). This strategy reduces the constraints

Assembly of Ancient Mitochondrial Genomes

205

imposed on the assembly by the initial structure of the reference with respect to orientation and position of conserved regions and can be used when full mt genome reference is available, but rearrangements are expected. Runtime for this analysis is ~6 h. # update variables and input files where necessary $ ref_id=Trichechus_manatus.AM904728.split # make and enter a new analysis directory $ mkdir $sample.$ref_id $ cd $sample.$ref_id $ MITObim.pl -end 500 -sample $sample -ref $ref_id \ --quick $reffasta -readpool $readpool --clean --split \ --trimreads --min_len 200 --min_cov 50 &> MITObim.log.txt

MITObim reaches a stationary read number after 48 iterations and has produced five contigs. Note that the above strategy will not merge overlapping contigs automatically. We will therefore resume the process for one further iteration (starting from the assembly result of iteration 48 in maf format) and assemble the putative mitochondrial read pool identified in the previous iterations completely de novo, i.e., removing all assembly constraints imposed by the result of the previous iteration. Runtime is ~20 s. # update variables and input files where necessary $ prefix=$sample-$ref_id $ result_dir=iteration48/$prefix\_assembly/$prefix\_d_results $ maf=$result_dir/$prefix\_out.maf $ readpool=iteration48/$sample-readpool-it48.fastq $ MITObim.pl -start 49 -end 49 -sample $sample -ref $ref_id \ -maf $maf -readpool $readpool --denovo --trimreads \ --min_len 500 --min_cov 50 &> MITObim.denovo.log.txt

This final assembly step has produced a single contig of length ~16,900 bp. General strategies for assembly quality assessment are demonstrated in Subheading 3.4. Note that the de novo assembly step applied in the last iteration can in principle also be applied as an overall strategy for the MITObim procedure: i.e., the read pool is assembled de novo in each iteration. 3.3.3 Short Barcode Reference

This strategy uses the nucleotide sequence of a single short reference sequence as bait, which will be extended iteratively until a stable read number (and presumably a circular molecule) is reached. This will be the method of choice if only limited reference resources are available. We will use the nucleotide sequence of the

206

Christoph Hahn

cytochrome oxidase I gene (COI) as an example bait. Runtime is ~7 min. # update variables and input files where necessary $ ref_id=Trichechus_manatus.AM904728.COI $ reffasta=$(pwd)/mt_refs/$ref_id.nt.fasta # make and enter a new analysis directory $ mkdir $sample.$ref_id $ cd $sample.$ref_id $ MITObim.pl -end 500 -sample $sample -ref $ref_id \ --quick $reffasta -readpool $readpool --clean \ --trimreads --split --min_len 200 --min_cov 50

MITObim aborts after iteration 1, indicating that no reads with reasonable match to the provided reference could be found. Per default, MITObim baits reads that share at least one single sub-sequence of length k ¼ 31 with the reference. COI is a relatively fast evolving gene, and it is therefore not unexpected that M. primigenius and T. manatus, which shared a MRCA some 75 million years ago, should have diverged far enough that this criterion is not fulfilled for this gene any more. We will therefore try again, but reduce the length of the minimum match to 21 bp (“--kbait 21”). Runtime is ~7 min. # remove results from previous iterations $ rm -rf iteration* $ MITObim.pl -end 1 -sample $sample -ref $ref_id \ --quick $reffasta -readpool $readpool --clean \ --kbait 21 --trimreads --split --min_len 200 --min_cov 50

After one iteration, MITObim has recovered one contiguous sequence of length 548 bp. The program blastn of the BLAST+ command line suite of programs can be used to quickly verify that the recovered contig is a genuine fragment of COI by comparing it to the T. manatus COI sequence that we used as reference. Note that per default blastn expects at least a single perfect match of length 28 bp between query and subject. This behavior can be adjusted via the option “-word_size”, which we will set to 21 bp, according to the length k we have used before. # define location of the query file $ prefix=$sample-$ref_id $ query=iteration1/$prefix-it1_noIUPAC.fasta

Assembly of Ancient Mitochondrial Genomes

207

# compare contig to COI reference sequence $ blastn -query $query -subject $reffasta -word_size 21

The output that is presented to us shows a full-length match with the T. manatus COI sequence, overall similarity of ~80%. Now that we have identified a likely genuine COI seed sequence, we can resume MITObim with the default k ¼ 31. Runtime is ~9 h. # specify location of maf file $ result_dir=iteration1/$prefix\_assembly/$prefix\_d_results $ maf=$result_dir/$prefix\_out.maf $ MITObim.pl -start 2 -end 500 -sample $sample -ref $ref_id \ -maf $maf -readpool $readpool --clean \ --trimreads &> MITObim.log.txt

Using this strategy, MITObim yields a single contig of length 16,870 bp after 75 iterations. 3.4 Assessing Mitochondrial Genome Assembly Quality

Assessing the quality of genome assemblies without a reference sequence is notoriously difficult. In this section we will discuss a number of strategies particularly useful for assessing mitochondrial genome reconstruction and will demonstrate them on the result obtained in Subheading 3.3.3 above.

3.4.1 Circularity

Mitochondrial genomes are circular molecules, and the detection of circularity in the assembly result may be employed as quality criterion. Convergence to a stable number of baited reads during the iterative assembly procedure employed by MITObim may be considered a first indication of the expected circularity. In addition, MITObim provides a script that attempts to detect circularity in nucleotide sequences. For a given nucleotide sequence, the script determines all occurring unique subsequences of length k, or kmers, and their position in the original nucleotide sequence. It then reports the number of kmers that are found duplicated at each observed distance between duplicates. The occurrence of multiple kmer duplicates at the same distance that coincides with the expected length of the mitochondrial genome can be interpreted as a sign of circularity. It should be noted that the presence of tandem repeats longer than the kmer length can be problematic for accurately establishing circularity and the correct length of the circular molecule. # specify variables $ ref_id=Trichechus_manatus.AM904728.COI $ result_dir=$sample.$ref_id/iteration75 $ fasta=$result_dir/$sample-$ref_id-it75_noIUPAC.fasta

208

Christoph Hahn # search for indication of circularity with k=31 $ circules.py -f $fasta -k 31

The script suggests clipping the sequence to a length of 16,715 bp (supported by 125 duplicated 31mers), which is in the expected length range for a mammalian mitochondrial genome. It also reports a hexa-nucleotide motif (“ACGCAT”) to be tandemly repeated (21) in the assembled sequence. Incidentally, the mitochondrial control region (D-loop) is known to contain short hexanucleotide tandem repeats in many vertebrates, including mammoths, where the tandem repeated motif was reported as “CGCATA” [23]. # clip sequence at suggested clipping points # specify a prefix for the output via -p $ circules.py -f $fasta -k 31 -c 0,16715 -p COI

The previous command creates two fasta files: one contains the putative circular nucleotide sequence clipped according to the clipping coordinates specified and a second one containing a 1000 bp fragment constructed by joining the “ends” of the putative circular sequence at its center. The latter file can be used to verify circularity, by using it as a seed for a single round of MITObim reconstruction using the putative mitochondrial read pool previously identified. If the test sequence does not represent a genuine contiguous mitochondrial sequence, we would expect the assembly result to reveal discrepancies. Note that after the initial baiting procedure, the reads will be assembled de novo, i.e., results will be based exclusively on overlap information present in sequence reads and unconstrained by the test sequence. Runtime is ~6 s. # define variables and input files where necessary $ result_dir=$(pwd)/$sample.$ref_id/iteration75 $ readpool=$result_dir/$sample-readpool-it75.fastq $ ref_id=circularity.test.COI $ reffasta=$(pwd)/COI.16715.for-testing.fasta $ mkdir $sample.$ref_id $ cd $sample.$ref_id $ MITObim.pl -end 0 -sample $sample -ref $ref_id \ -readpool $readpool --quick $reffasta --denovo \ --trimreads &> MITObim.log.txt

Now, we’ll use blastn to compare the result of this last assembly to the seed sequence that was constructed by joining the ends of the putative circular sequence.

Assembly of Ancient Mitochondrial Genomes

209

$ query=iteration0/$sample-$ref_id-it0_noIUPAC.fasta $ blastn -query $query -subject $reffasta

The result is 100% identical to the test sequence, thus confirming the circularity of the nucleotide sequence assembled by MITObim. 3.4.2 Comparison to Reference Nucleotide Sequence

The most intuitive quality control for an assembly result is to compare it to a reference genome. However, MITObim allows mitochondrial genome reconstruction to be seeded by very distantly related reference sequences that may not allow for a comprehensive comparative assessment of the assembly result at the nucleotide level. Nevertheless, in the following we again use the blastn program to compare the assembly result to the mitochondrial genome of T. manatus, keeping in mind that they have diverged some 75 million years ago. # specify input files $ query=COI.circular.16715.fasta $ reffasta=mt_refs/Trichechus_manatus.AM904728.fasta # compare using blastn $ blastn -query $query -subject $reffasta -word_size 21

The search returns three matches with 74–79% sequence identity across a cumulative length of ~16 kbp, despite the extensive evolutionary distance and indicates a high-quality genome reconstruction. 3.4.3 Gene Complement

Conservation at the protein level facilitates comparisons across greater evolutionary distances. Strictly speaking, the exact gene complement and order cannot be known a priori when assembling a previously uncharacterized mitochondrial genome. However, with some exceptions, animal mitochondrial genomes are conserved with respect to these characteristics (reviewed in [24]), and the confirmation of a standard mitochondrial gene complement can be considered a further quality criterion for mitochondrial genome assemblies. Exact prediction of mitochondrial genes is not trivial, but a quick overview can be achieved in seconds by comparing the gene complement of a reference species to a newly assembled mitochondrial genome using the program tblastn of the BLAST+ command line suite, a program that is designed to compare protein against nucleotide sequences. # specify input files $ refproteins=mt_refs/Trichechus_manatus.AM904728.CDS.aa.fasta $ ntfasta=COI.circular.16715.fasta

210

Christoph Hahn # compare using tblastn $ tblastn -query $refproteins -subject $ntfasta

The search returns highly significant matches for 12 of the 13 mitochondrial genes (all except ATP8 match across >85% of their length) of T. manatus. More elaborate prediction of mitochondrial genes can be achieved, e.g., using MITOS [25], which is maintained via a webserver (http://mitos.bioinf.uni-leipzig.de, last accessed 16 Jan 2019). MITOS detects the full set of mitochondrial coding, rRNA and tRNA genes in our assembly result, indicating high quality. 3.4.4 Verification by Repeated Assembly with Variation

As discussed briefly in Subheading 3.2 above, an experimental approach to genome assembly may provide a powerful evaluation strategy. I therefore suggest testing assembly results for robustness across multiple technical replicates using random subsets of reads and/or a range of different parameter settings. While this approach may be impractical for large, complex nuclear genomes due to limited computational resources, such experiments are feasible on modern desktop computers for mitochondrial genomes. In the following, we will perform multiple sequence alignment of the assembly results that we have obtained via the three strategies outlined in Subheading 3.3 using the program MUSCLE [21] and compare the results. We will first “roll” the putatively circular results obtained via the three strategies to the same starting position. We choose position 2520 in the result from Subheading 3.3.3, which was identified as starting base of tRNA Phe by MITOS, but the choice is arbitrary. Note that results from strategies in Subheadings 3.3.1 and 3.3.2 first need to be clipped as demonstrated in Subheading 3.4.1. # roll result from Subheading 3.3.3 to new starting position $ circules.py -n 2520 -p COI -f COI.circular.16715.fasta # clip result from Subheading 3.3.1 $ ref_id=Trichechus_manatus.AM904728.mtgenome $ result_dir=$sample.$ref_id/iteration47 $ fasta=$result_dir/$sample-$ref_id-it47_noIUPAC.fasta $ circules.py -f $fasta -k 31 -c 0,16886 -p mtgenome # identify the homologous starting position $ blastn -query COI.rolled.2520.fasta \ -subject mtgenome.circular.16886.fasta # roll the result from Subheading 3.3.1 to position 16775 # according to the blastn result $ circules.py -n 16775 -p mtgenome \ -f mtgenome.circular.16886.fasta

Assembly of Ancient Mitochondrial Genomes

211

# clip result from Subheading 3.3.2 $ ref_id=Trichechus_manatus.AM904728.split $ result_dir=$sample.$ref_id/iteration49 $ fasta=$result_dir/$sample-$ref_id-it49_noIUPAC.fasta $ circules.py -f $fasta -k 31 -c 0,16727 -p split # identify the homologous starting position $ blastn -query COI.rolled.2520.fasta \ -subject split.circular.16727.fasta # roll the result from Subheading 3.3.2 to position 49 # according to the blastn result $ circules.py -n 49 -p split -f split.circular.16727.fasta # concatenate all results into a single file $ cat *rolled* > M_primigenius.results.fasta # align with muscle - output in fasta format $ muscle -in M_primigenius.results.fasta \ -out M_primigenius.results.aln.fasta

Inspecting the multiple sequence alignment using a software solution of your choice, e.g., Jalview [26], Aliview [27], or MEGA [28], reveals minimal disagreement between assembly results. Differences are largely restricted to what appears to be the mitochondrial control region (D-loop), a section characterized by tandem repeats in many animals which is often impossible to resolve unambiguously using short-read sequencing technologies. Generally, discrepancies between technical replicates may indicate problematic regions of the assembly that can be investigated visually at the read level using appropriate visualization tools (see Subheading 3.4.5 below) and resolved using assembly parameter tuning that is beyond the scope of this tutorial or targeted Sanger sequencing. 3.4.5 Visual Evaluation of Assembly Results

While for large complex genomes, manual inspection of assemblies at the read level often has to be restricted to only key regions of interest, the relatively small size of animal mitochondrial genomes makes a critical visual inspection of the entire result feasible. A number of graphical visualization tools exist that allow to visually identify problematic assembly regions characterized through increased conflict at the read level. Tablet [22] is convenient for visualizing assemblies generated with MIRA because it natively supports the ace format (see Subheading 3.3.1 above for how to generate an assembly in ace format). IGV [29] is another widely used tool for visualization of genome assemblies but requires bam input format that is not directly supported by MIRA.

212

4

Christoph Hahn

Note 1. The assembly of (mitochondrial) genomes is a complex and fast-developing field, and technologies, algorithms, and approaches are constantly being improved. Successful accurate reconstruction of mitochondrial genomes depends on many factors, including particular characteristics of the target genome, such as repeat content, GC content, etc. A number of tools have been developed for the assembly of organellar genomes (see Table 2). To date, MITObim appears to be the most widely used and has provided robust results across a wide range of taxa, but there is no “one-size-fits-all” solution to the complex problem of mitochondrial genome assembly.

Table 2 Non-exhaustive summary of existing tools specifically designed for the assembly of organellar genomes from short-read data (all last accessed 16 Jan 2019) Short name

URL

Reference

ARC

https://ibest.github.io/ARC/

[30]

MIA

https://github.com/mpieva/mapping-iterative-assembler

[31]

MITObim

https://github.com/chrishah/MITObim

[16]

ORG.Asm

https://git.metabarcoding.org/org-asm/org-asm



GRAbB

https://github.com/b-brankovics/grabb

[32]

NOVOPlasty

https://github.com/ndierckx/NOVOPlasty

[33]

References 1. Green RE et al (2010) A draft sequence of the Neandertal genome. Science 328:710–722 2. Meyer M et al (2012) A high-coverage genome sequence from an archaic Denisovan individual. Science 338:222–226 3. Schubert M et al (2014) Prehistoric genomes reveal the genetic foundation and cost of horse domestication. Proc Natl Acad Sci U S A 111: E5661–E5669 4. Palkopoulou E et al (2015) Complete genomes reveal signatures of demographic and genetic declines in the woolly mammoth. Curr Biol 25:1395–1400 5. We˛cek K et al (2016) Complex admixture preceded and followed the extinction of wisent in the wild. Mol Biol Evol. https://doi.org/10. 1093/molbev/msw254 6. Barnett R et al (2016) Mitogenomics of the extinct cave lion, Panthera spelaea (Goldfuss,

1810), resolve its position within the Panthera cats. Open Quaternary 2:4 7. Lindqvist C et al (2010) Complete mitochondrial genome of a Pleistocene jawbone unveils the origin of polar bear. Proc Natl Acad Sci U S A 107:5053–5057 8. Llamas B et al (2016) Ancient mitochondrial DNA provides high-resolution time scale of the peopling of the Americas. Sci Adv 2:e1501385 9. Hervella M et al (2016) The mitogenome of a 35,000-year-old Homo sapiens from Europe supports a Palaeolithic back-migration to Africa. Sci Rep 6:25501 10. Soubrier J et al (2016) Early cave art and ancient DNA record the origin of European bison. Nat Commun 7:13158 11. Paijmans JLA, Gilbert MTP, Hofreiter M (2013) Mitogenomic analyses from ancient DNA. Mol Phylogenet Evol 69:404–416

Assembly of Ancient Mitochondrial Genomes 12. Soares AER et al (2016) Complete mitochondrial genomes of living and extinct pigeons revise the timing of the columbiform radiation. BMC Evol Biol 16:230 13. Gansauge M-T, Meyer M (2014) Selective enrichment of damaged DNA molecules for ancient genome sequencing. Genome Res 24:1543–1549 14. Carpenter ML et al (2013) Pulling out the 1%: whole-genome capture for the targeted enrichment of ancient DNA sequencing libraries. Am J Hum Genet 93:852–864 15. Nagarajan N, Pop M (2013) Sequence assembly demystified. Nat Rev Genet 14:157–167 16. Hahn C, Bachmann L, Chevreux B (2013) Reconstructing mitochondrial genomes directly from genomic next-generation sequencing reads—a baiting and iterative mapping approach. Nucleic Acids Res 41(13): e129 17. Chevreux B, Wetter T, Suhai S et al (1999) Genome sequence assembly using trace signals and additional sequence information. German Conf Bioinformatics 99:45–56 18. Cock PJA, Fields CJ, Goto N, Heuer ML, Rice PM (2010) The Sanger FASTQ file format for sequences with quality scores, and the Solexa/ Illumina FASTQ variants. Nucleic Acids Res 38:1767–1771 19. Arnason U et al (2008) Mitogenomic relationships of placental mammals and molecular estimates of their divergences. Gene 421:37–51 20. Camacho C et al (2009) BLAST+: architecture and applications. BMC Bioinformatics 10:421 21. Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797 22. Milne I et al (2013) Using tablet for visual exploration of second-generation sequencing data. Brief Bioinform 14:193–202

213

23. Rogaev EI et al (2006) Complete mitochondrial genome and phylogeny of Pleistocene mammoth Mammuthus primigenius. PLoS Biol 4:e73 24. Lavrov DV, Pett W (2016) Animal mitochondrial DNA as we do not know it: mt-genome organization and evolution in nonbilaterian lineages. Genome Biol Evol 8:2896–2913 25. Bernt M et al (2013) MITOS: improved de novo metazoan mitochondrial genome annotation. Mol Phylogenet Evol 69:313–319 26. Waterhouse AM, Procter JB, Martin DMA, Clamp M, Barton GJ (2009) Jalview Version 2—a multiple sequence alignment editor and analysis workbench. Bioinformatics 25:1189–1191 27. Larsson A (2014) AliView: a fast and lightweight alignment viewer and editor for large datasets. Bioinformatics 30:3276–3278 28. Tamura K, Stecher G, Peterson D, Filipski A, Kumar S (2013) MEGA6: molecular evolutionary genetics analysis version 6.0. Mol Biol Evol 30:2725–2729 29. Robinson JT et al (2011) Integrative genomics viewer. Nat Biotechnol 29:24–26 30. Hunter SS et al (2015) Assembly by reduced complexity (ARC): a hybrid approach for targeted assembly of homologous sequences. bioRxiv 014662. doi:10.1101/014662 31. Green RE et al (2008) A complete Neandertal mitochondrial genome sequence determined by high-throughput sequencing. Cell 134:416–426 32. Brankovics B et al (2016) GRAbB: selective assembly of genomic regions, a new niche for genomic research. PLoS Comput Biol 12: e1004753 33. Diercksens N, Mardulyn P, Smits G (2017) NOVOPlasty: de novo assembly of organelle genomes from whole genome data. Nucleic Acids Res 45:e18

INDEX A

D

Adapter decontamination .........................................78, 79 Adapter ligation.................................................70, 86, 89, 109, 113, 114, 171 Amplicons ............................................................ 142, 143, 145, 150, 153, 154, 157–160 Ancient DNA (aDNA)............................................ 21, 31, 57, 65, 85, 93, 141, 163 Ancient DNA damage crosslinks.................................................................. 2, 3 hydrolysis ........................................................ 2, 3, 164 miscoding lesions .................................................... 2, 3 oxidation......................................................... 2, 3, 164 strand breaks................................................................ 2 Ancient DNA laboratory setup ...................8–10, 31, 143 Ancient DNA research guidelines ................................ 5–7 Ancient plant DNA ...................................................45, 46 Archaeobotany ................................................................ 46 Archaeogenomics ............................................................ 46 Archaeological specimens ............................................... 51 Array design.......................................................... 132, 134 Artifacts........................................................................3, 82

Demographic inference and contamination estimates (DICE) .................................... 165, 167, 182–186, 189 DNA amplification ................................................. 75, 126 DNA binding ..................................................... 27, 28, 50 DNA contamination ...........................4, 15–19, 163, 164 DNA extraction....................................................... 4, 6, 8, 16, 18, 24, 25, 34, 38, 41, 46, 49, 50, 61, 89, 91, 146, 163, 164 DNA purification magnetic beads ............................................. 89, 90, 94 silica........................................................ 25, 58, 59, 91 DNA survival...............................................................3, 10 Drill ....................................................................11, 17, 23, 28, 34, 35, 37–39, 42, 52, 59 DTT solution ..................................................... 38, 58, 59

B Base composition plot ......................................... 173, 174 Biotinylated baits DNA ........................................................................ 131 RNA ........................................ 94, 108, 109, 123, 124 Bleach.............................................................9, 15–18, 21, 28, 33, 34, 36–38, 42, 47, 48, 52, 58, 59, 61, 143 Blunt-end repair ................................................. 69, 89, 98 Bones ...............................................................4, 6, 10, 11, 15–19, 21–23, 25–28, 57, 75, 93, 163, 164, 174

E EDTA buffer ........................................................ 122, 124 Endogenous DNA .................................4, 10, 16, 17, 21, 23, 46, 49, 93, 102, 121, 141, 163, 174, 175 Environmental DNA (eDNA)..............31, 108, 149, 150 Exoskeleton ..................................................................... 59

F Fragmentation patterns ......................170, 171, 173, 174

G Genome assembly ............................................... 196, 197, 201, 202, 206, 208, 210, 212 Guanidine hydrochloride................................... 25, 26, 47

C

H

Capture-based enrichment .................................. 107–119 Centrifuge............................................................ 8, 18, 22, 26–28, 33–36, 38, 41, 48–51, 54, 58–61, 68, 70, 71, 81, 116, 123, 132, 155, 157 Chitinous tissue.........................................................57, 59 Computer hardware ...................................................... 197 Contamination .................................................. 3–5, 7, 10, 15–19, 35, 38, 41, 43, 51, 58, 86, 93, 118, 122, 143, 145, 153, 164, 169, 171, 174–187, 189, 190

Hair sample ..................................................................... 46 High-throughput sequencing (HTS) ............................. 7, 25, 46, 65, 75, 85, 108, 121, 142, 165 Hybridization capture commercial kit .................................................. 86, 154 DNA baits......................................................... 94, 131 DNA recovery ...................... 45, 46, 48, 51, 125–126 home-made................................................................ 94 RNA baits ........................................ 94, 117, 121–128

Beth Shapiro et al. (eds.), Ancient DNA: Methods and Protocols, Methods in Molecular Biology, vol. 1963, https://doi.org/10.1007/978-1-4939-9176-1, © Springer Science+Business Media, LLC, part of Springer Nature 2019

215

ANCIENT DNA: METHDOS

216 Index

AND

PROTOCOLS

I

R

Illumina sequencing................................................ 66, 85, 142, 143, 166, 196 Illumina sequencing primers ........................................ 123 Incubation oven ................................................. 38, 48, 49 Indexing PCR........................................... 66, 71, 72, 137, 142, 144, 145, 153, 155, 157, 158 Ion Torrent PGM .....................................................85, 86

Read data ......................................................196–201, 212 Reference genome............................15–17, 25, 166, 168, 169, 171, 174, 184, 188, 196, 198, 202, 204, 208 Repetitive DNA............................................................. 108

K Keratin .......................................................................57–62 Keratinous sample ........................................................... 59

L Laboratory setup ......................................................... 8–10 Library elution ................................................................ 80

M Magnetic beads .................................................86, 89, 94, 102, 122, 123, 125 mapDamage......................................................... 165, 169, 170, 186–188 Metabarcoding ............................. 31, 150, 153–157, 159 Metagenomics ................................................................. 31 Microarray capture ........................................................ 130 MITObim............................................196, 198–208, 212 Mitochondrial genome ......................................... 94, 121, 130, 131, 164–167, 169, 175–182, 186, 187, 195 Multiplex PCR ........................... 142–146, 153, 155, 159

P Paleoethnobotany .....................................................45, 46 PALEOMIX pipeline ...................................165–169, 187 PCR inhibitors .............................................................. 150 Permafrost ...........................................3, 10, 32, 129, 150 Phosphate buffer ....................................... 16–19, 22, 122 Polymerase enzymes ............................................. 3, 4, 32, 35, 68, 70–72, 76, 81, 89, 96, 98, 101, 108, 109, 111, 115, 128, 142–144, 146, 153, 159 Post-mortem damage plot..................170, 171, 173, 174 Post-morten degradation ................................................. 3 Post-processing of alignments............................. 165, 167 Preprocessing Illumina data ......................................... 197 Present-day human contamination .............................164, 169, 174–184, 186, 187, 189, 190 Pretreatment phosphate ............................................................ 15–19 pre-digestion ....................................................... 21–24 proteinase K............................................................... 22 sodium hypochlorite (bleach) ............................ 15–19 Proteinase K ...................................................... 22, 24–26, 35, 38, 43, 47, 57–59, 61

S Sample preparation subsampling...................................................... 32, 203 Schmutzi ............................ 165, 167, 176, 179–182, 187 Sediment cores frozen ............................................... 32–35, 37–39, 42 non-frozen ..............................................32, 33, 36–37 Sedimentary ancient DNA (sedaDNA) ........................ 31, 32, 41, 42, 150, 160 Sequencing .......................................................... 3, 25, 46, 65, 75, 85, 94, 107, 121, 130, 141, 150, 164, 195 Sequencing adapters ............................................... 65, 78, 146, 150, 153, 159, 165, 166, 169, 187 Sequencing library indexing ........................................... 71 Sequencing library preparation amplicons ................................................................. 143 barcode preparation .................................................. 88 double-stranded ................... 65–72, 75, 79, 102, 134 illumina ............................................85, 110, 124, 143 non-illumina .............................................................. 90 single-stranded ................... 66, 75–82, 102, 134, 171 Sequencing library purification ...................................... 70 Sequencing reads mapping ........................................... 167 Silica ...................................................................25, 28, 38, 46, 52, 58, 59, 61, 62, 70, 91 Single nucleotide polymorphism (SNP) ....................... 94, 101, 130, 141–146 Solid-state DNA capture ..................................... 129–137 Streptavidin-beads ................................................ 125, 126

T T4 DNA ligase .................. 68, 70, 76, 79, 80, 82, 96, 98 T7 polymerase transcription......................................... 108 Teeth ................................ 4, 10, 15–19, 25–28, 163, 164 Tissue digestion............................................................... 59 Tracer DNA..................................................32–38, 41, 42

U Universal primers ................................................. 146, 150 UV cross-linker ........................................... 33, 34, 36, 37

V Vacuum pump ...........................................................26, 27

W Whole-genome target enrichment ................................. 94