yes Phylogenomic analyses clarify the pattern of evolution of Adephaga (Coleoptera) and highlight phylogenetic artefacts due to model misspecification and excessive data trimming [1/1, 1 ed.]

124 61 12MB

English Pages 127 Year 2021

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

yes 
Phylogenomic analyses clarify the pattern of evolution of Adephaga (Coleoptera) and highlight phylogenetic artefacts due to model misspecification and excessive data trimming [1/1, 1 ed.]

Table of contents :
Main text, pp. 1-28
File S1, pp. 29-48
File S2, pp. 49-62
File S3, pp. 63-127

Citation preview

Systematic Entomology (2021), DOI: 10.1111/syen.12508

Phylogenomic analyses clarify the pattern of evolution of Adephaga (Coleoptera) and highlight phylogenetic artefacts due to model misspecification and excessive data trimming A L E X A N D R O S V A S I L I K O P O U L O S 1, M I C H A E L B A L K E 2, S A N D R A K U K O W K A 1, J A M E S M . P F L U G 3, S E B A S T I A N M A R T I N 1, K A R E N M E U S E M A N N 4, L A R S H E N D R I C H 2, C H R I S T O P H M A Y E R 1, D AV I D R . M A D D I S O N3 , O L I V E R N I E H U I S4 , R O L F G . B E U T E L 5 and B E R N H A R D M I S O F 6 1 Centre for Molecular Biodiversity Research, Zoological Research Museum Alexander Koenig, Bonn, Germany, 2 Department of Entomology, SNSB-Bavarian State Collections of Zoology, Munich, Germany, 3 Department of Integrative Biology, Oregon State University, Corvallis, OR, U.S.A., 4 Department of Evolutionary Biology and Animal Ecology, Institute of Biology I (Zoology), Albert-Ludwig University of Freiburg, Freiburg, Germany, 5 Institut für Zoologie und Evolutionsforschung, Friedrich-Schiller-Universität Jena, Jena, Germany and 6 Zoological Research Museum Alexander Koenig, Bonn, Germany

Abstract. Adephaga is the second largest suborder of Coleoptera and contains aquatic and terrestrial groups that are sometimes classified as Hydradephaga and Geadephaga, respectively. The phylogenetic relationships of Adephaga have been studied intensively, but the relationships of the major subgroups of Geadephaga and the placement of Hygrobiidae within Dytiscoidea remain obscure. Here, we infer new DNA-hybridization baits for exon-capture phylogenomics and we combine new hybrid-capture sequence data with transcriptomes to generate the largest phylogenomic taxon sampling within Adephaga presented to date. Our analyses show that the new baits are suitable to capture the target loci across different lineages of Adephaga. Phylogenetic analyses of moderately trimmed supermatrices confirm the hypothesis of paraphyletic ‘Hydradephaga’, with Gyrinidae placed as sister to all other families as in morphology-based phylogenies, even though quartet-concordance analyses did not support this result. All analyses conducted with site-heterogeneous models suggest Trachypachidae as sister to a clade Carabidae + Cicindelidae in congruence with results from morphological studies. Haliplidae is inferred as sister to Dytiscoidea, while a clade of Noteridae (+ most likely Meruidae) is inferred as sister to all remaining Dytiscoidea. A strongly supported clade Hygrobiidae + (Amphizoidae + monophyletic Aspidytidae) is inferred in most analyses of moderately trimmed supermatrices when a site-heterogeneous model is used. In general, we find that stringent trimming of supermatrices results in reduced deviation from model assumptions but also in reduction of phylogenetic information. We also find that site-heterogeneous C60 models provide greater stability of phylogenetic relationships of Adephaga across analyses of different amino-acid supermatrices than site-homogeneous models. Thus, site-heterogeneous C60

Correspondence: Alexandros Vasilikopoulos, Centre for Molecular Biodiversity Research, Zoological Research Museum Alexander Koenig, 53113 Bonn, Germany. E-mail: [email protected] Bernhard Misof, Zoological Research Museum Alexander Koenig, 53113 Bonn, Germany. E-mail: [email protected] [Corrections added on 30 July 2020, after the first online publication: Copyright line has been updated] © 2021 The Authors. Systematic Entomology published by John Wiley & Sons Ltd on behalf of Royal Entomological Society. This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.

1

2 A. Vasilikopoulos et al.

models can potentially reduce incongruence in phylogenomics. Lastly, we show that gene-tree errors are prominent in the data, even after sub-sampling genes to reduce these errors, but we also show that subsampling genes based on the likelihood mapping criterion in summary coalescent analyses results in higher topological congruence with the concatenation-based tree. Overall, our analyses demonstrate that moderate alignment trimming strategies, application of site-heterogeneous models and mitigation of gene-tree errors should be routinely included in the phylogenomic pipeline in order to more accurately infer the phylogeny of species.

Introduction Beetles (Coleoptera) are the most speciose insect order and their phylogeny has been the focus of attention for many decades (Crowson, 1960; Lawrence & Newton, 1982; Hunt et al., 2007; Lawrence et al., 2011; Beutel et al., 2019a, 2020; McKenna et al., 2019). Polyphaga is the largest beetle suborder with numerous phytophagous species (McKenna et al., 2019) but also many other feeding habits. Adephaga, which mostly includes predacious species, is the second largest beetle suborder with more than 45 000 species assigned to 11 families (Beutel et al., 2020; Duran & Gough, 2020). The family-level phylogenetic relationships of Adephaga have been extensively debated but scientists are now reaching a consensus on the most likely scenario of their evolution (McKenna et al., 2019; Beutel et al., 2020; Gustafson et al., 2020). Despite this, open questions remain, such as the phylogenetic relationships of the major terrestrial groups, the phylogenetic position of Hygrobiidae within Dytiscoidea and the intra-familial relationships within Carabidae, Cicindelidae and Dytiscidae (Michat et al., 2017; Vasilikopoulos et al., 2019; Beutel et al., 2020; Gustafson et al., 2020). In addition, previous analyses of family-level relationships in Adephaga have suggested that some results of previous studies might be artefacts due to systematic errors (Cai et al., 2020). In this study, we address these unresolved issues by combining newly generated exon-capture sequence data with transcriptomic sequence data to infer the phylogeny of Adephaga based on extensive sampling of species. The majority of species diversity in Adephaga belong to the terrestrial family Carabidae (ground beetles, >35 000 extant species), whereas the closely related family Cicindelidae is a medium-sized terrestrial group (tiger beetles, >2400 extant species). Trachypachidae is another terrestrial family with only six extant species (Beutel et al., 2020; Duran & Gough, 2020; Lorenz, 2020). These families have been collectively referred to as ‘Geadephaga’ (Crowson, 1960). The monophyly of this unit has been disputed in the past based on analyses of morphological characters (Burmeister, 1976; Beutel & Roughley, 1988), but most recent morphological and molecular analyses suggest a single origin of the terrestrial adephagan groups (Beutel et al., 2006, 2020; Maddison et al., 2009; McKenna et al., 2019; Gustafson et al., 2020). In contrast to this, the phylogenetic relationships among Carabidae, Cicindelidae and Trachypachidae remain controversial as different phylogenomic analyses

have produced different topologies. Phylotranscriptomic analyses have placed Trachypachidae as sister to Carabidae + Cicindelidae (McKenna et al., 2019). In contrast, analyses of mitochondrial genomes suggested a weakly supported clade of Cicindelidae + Trachypachidae as sister to Carabidae (López-López & Vogler, 2017), whereas analyses of ultraconserved elements (UCEs) suggested Cicindelidae + (Carabidae + Trachypachidae) (Gustafson et al., 2020). It should be noted, however, that the taxon sampling of previous phylogenomic studies was not sufficient to test the monophyly of Carabidae and Cicindelidae and to robustly infer the phylogenetic position of the small family Trachypachidae (Zhang et al., 2018b; McKenna et al., 2019; Gough et al., 2020; Gustafson et al., 2020). In addition, the results of some molecular analyses do not agree with results of morphological studies that suggest Trachypachidae as sister to Carabidae + Cicindelidae (Beutel et al., 2020). Therefore, a re-evaluation of the relationships of Geadephaga with careful examination of potential sources of systematic error and increased species sampling is needed. The species of the remaining eight families of Adephaga (Amphizoidae, Aspidytidae, Dytiscidae, Haliplidae, Hygrobiidae, Meruidae, Noteridae and Gyrinidae) occur primarily in aquatic or semi-aquatic habitats (Jäch & Balke, 2008; Short, 2018). Most species of Dytiscidae, Gyrinidae, Hygrobiidae and Noteridae are strictly aquatic. Species of Amphizoidae are also aquatic, whereas Aspidytidae and Meruidae occur in hygropetric habitats (Kavanaugh, 1986; Balke et al., 2003; Spangler & Steiner, 2005; Vasilikopoulos et al., 2019). Crowson (1960) suggested that all the aquatic and semi-aquatic groups constitute a monophylum to which he referred to as ‘Hydradephaga’. Only a few molecular phylogenetic studies have supported this concept (Shull et al., 2001; Ribera et al., 2002; McKenna et al., 2015; López-López & Vogler, 2017), whereas the monophyly of this group has been refuted in more comprehensive studies based on analyses of morphological characters and genomic data (Beutel & Roughley, 1988; Beutel et al., 2006, 2020; Baca et al., 2017a; Gustafson et al., 2020; McKenna et al., 2019). More specifically, the placement of Gyrinidae as sister to all other Adephaga is currently a well-accepted scenario (Baca et al., 2017a; Beutel et al., 2020; Beutel & Roughley, 1988; Gustafson et al., 2020; but see Freitas et al., 2021). In addition, most analyses suggest a sister group relationship of Haliplidae to the superfamily Dytiscoidea (which includes Amphizoidae, Aspidytidae,

© 2021 The Authors. Systematic Entomology published by John Wiley & Sons Ltd on behalf of Royal Entomological Society. doi: 10.1111/syen.12508

Phylogenomics of Adephaga Dytiscidae, Hygrobiidae, Meruidae and Noteridae) and a clade Meruidae + Noteridae as sister to all remaining families of Dytiscoidea (Beutel et al., 2006; Baca et al., 2017a; Vasilikopoulos et al., 2019; Gustafson et al., 2020). Despite this, the phylogenetic position of the family Hygrobiidae (squeak beetles) within Dytiscoidea remains contentious (Toussaint et al., 2016; Baca et al., 2017a; Vasilikopoulos et al., 2019, 2021; Cai et al., 2020; Gustafson et al., 2020). The issue of model and data selection has received considerable attention in the context of the phylogeny of insects and other groups (Misof et al., 2013; Lanfear et al., 2014; Song et al., 2016; Feuda et al., 2017; Ballesteros & Sharma, 2019; Cai et al., 2020; Kapli & Telford, 2020; Evangelista et al., 2021). Specifically, several studies have demonstrated that using unrealistic models of molecular evolution might result in spurious phylogenetic estimates (Lartillot et al., 2007; Song et al., 2010, 2016; Wang et al., 2019; Crotty et al., 2020; Kapli & Telford, 2020). It has also been suggested that selecting sites or genes with reduced deviation from model assumptions might be beneficial (Philippe et al., 2017; Simion et al., 2020). In contrast, other authors have shown that it is difficult to remove systematic bias from the data without removing phylogenetic signal at the same time (Mongiardino Koch & Thompson, 2021). Such issues relating to model misspecification and data selection were also recently discussed in the context of the phylogeny of Adephaga (Vasilikopoulos et al., 2019, 2021; Cai et al., 2020). Heterogeneous composition of amino acids and nucleotides across taxa or across alignment sites, systematic bias resulting from hypervariable alignment sites, and deficient taxon sampling are among the potential factors affecting the internal phylogeny of the superfamily Dytiscoidea, including the monophyly of Aspidytidae (Baca et al., 2017a; Vasilikopoulos et al., 2019; Cai et al., 2020; Gustafson et al., 2020). Furthermore, it has been observed that summary coalescent and concatenation-based phylogenetic analyses often deliver incongruent topologies within Adephaga (Baca et al., 2017a; Gustafson et al., 2020; Freitas et al., 2021). However, the factors that contribute to this incongruence remain poorly understood (Baca et al., 2017a; Freitas et al., 2021). In particular, the extent of gene-tree errors in previous summary coalescent analyses of Adephaga and their effect on species-tree estimation remain uncertain (Baca et al., 2017a; Vasilikopoulos et al., 2019; Gustafson et al., 2020). Thus, two issues are imperative for a thorough assessment of the phylogenetic relationships of Adephaga in the light of increased taxon sampling: (i) evaluating the extent of gene-tree errors in summary coalescent analyses and (ii) using biologically realistic models in concatenation-based analyses. The issue of data-collection strategies in phylogenetics has also been extensively discussed (McCormack et al., 2013; Young & Gillung, 2020) and several hybrid-enrichment (or sequence-capture) approaches for phylogenomics have been developed (Faircloth et al., 2012; Lemmon et al., 2012; Bragg et al., 2016; Mayer et al., 2016). The use of UCEs (Faircloth et al., 2012) is the only sequence-capture approach that has been applied to the phylogeny of Adephaga so far (Baca et al., 2017a; Gustafson et al., 2020). However, some authors have suggested the use of other sequence-capture or transcriptomic approaches

3

in addition to or independent of the UCE approach (Bank et al., 2017; Karin et al., 2020) in an attempt to validate and compare results among studies (see also Vasilikopoulos et al., 2021). In this sense, hybrid-enrichment of protein-coding exons (Bank et al., 2017; Sann et al., 2018; Mayer et al., 2021) is another sequence-capture method that can provide complementary or independent evidence for testing the validity of previously suggested phylogenetic hypotheses of Adephaga. Concerning the utility of the exon-capture approach across different scales of molecular divergence, previous research suggests it is only effective for investigating taxonomic clades characterized by small to moderate levels of molecular divergence (Bi et al., 2012; Bragg et al., 2016; Mayer et al., 2016). Nevertheless, if transcriptomic resources are available for a broad set of species within the group of interest, they can be used for testing the applicability of exon-specific DNA-hybridization baits at deeper evolutionary timescales with higher levels of molecular divergence. Additionally, recently developed bioinformatic approaches are able to automatically detect suitable regions for bait design in aligned DNA sequence data, including protein-coding data, by minimizing overall bait-to-target distances (Mayer et al., 2016). Therefore, these bioinformatic approaches offer a promising solution to the problem of designing probes with broad phylogenetic applicability (Lemmon & Lemmon, 2013). Transcriptomic and genomic resources for adephagan beetles have increased considerably in the last few years (Gustafson et al., 2019; McKenna et al., 2019; Vasilikopoulos et al., 2019). Combined with the above-mentioned bioinformatic approaches, these new data make it now possible to test the applicability and efficiency of exon capture for deep-level phylogenetics in Adephaga. In this study, we develop a new set of DNA-hybridization baits specifically tailored to capture hundreds of single-copy protein-coding genes across adephagan lineages and generate new hybrid-capture data to infer the phylogeny of Adephaga. We test the efficiency of this set of baits for locus recovery in a large number of specimens. We also combine the newly generated hybrid-capture data with transcriptomes to generate the most species-rich phylogenomic dataset for adephagan beetles presented to date. In order to avoid biased estimates of phylogeny of Adephaga, we take measures to minimize phylogenetic artefacts by: (i) employing biologically realistic models of sequence evolution and (ii) by reducing potentially biasing factors in the data, using data-filtering strategies that select conserved alignment sites. We evaluate the effects of model misspecification and excessive data trimming both on the results of phylogenetic tree reconstructions and on quartet-based analyses of phylogenetic incongruence in an attempt to acquire a more detailed view of phylogenetic signal, conflict and bias in the backbone phylogeny of Adephaga. We also explore whether or not gene-tree discordance (GTD) can be explained by gene-tree estimation errors and suggest possible strategies for selecting informative genes that may increase congruence with concatenation-based analyses. Lastly, we discuss our results in the context of the morphological evolution of Adephaga.

© 2021 The Authors. Systematic Entomology published by John Wiley & Sons Ltd on behalf of Royal Entomological Society. doi: 10.1111/syen.12508

4 A. Vasilikopoulos et al. Materials and methods Taxon sampling We combined 38 transcriptomes from 23 species of Adephaga and 15 outgroup species (File S1: Table S1) with newly generated exon-capture sequence data from 95 species of Adephaga (File S1: Table S2, note that two specimens of Hydrocanthus oblongus were initially processed but only one was included in the present study). Our initial taxon sampling comprised data from 118 species of Adephaga representing all families, except the monotypic Meruidae, and 21 outgroup species (two terminals of Hymenoptera, three of Mecopterida, two of Strepsiptera, four of Neuropterida, two of Myxophaga, two of Archostemata and six of Polyphaga). The initial taxon sampling includes the six reference species of the ortholog set (see below).

Inference of bait nucleotide sequences for hybrid enrichment of protein-coding exons We used 24 transcriptomes of Adephaga as a basis to build codon-based nucleotide multiple sequence alignments (MSAs) of orthologous genes and search for MSA regions that are suitable for bait design within Adephaga (see File S1: Table S1 and File S2). First, we used a custom ortholog gene set consisting of 3085 ortholog clusters of single-copy genes (COGs) at the hierarchical level of Holometabola (Vasilikopoulos et al., 2019) to assign orthologous transcripts from each transcriptome to each COG. Orthology assignment of transcripts to each COG was performed with Orthograph v. 0.6.1 (Petersen et al., 2017). Subsequently, we followed procedures for amino-acid MSA, alignment refinement, outlier-sequence removal and removal of reference taxa before generating codon-based nucleotide MSAs (see supplementary information of Misof et al., 2014a for details on these procedures). We then used Baitfisher v. 1.2.7 (Mayer et al., 2016) to screen the codon-based MSAs for regions that are appropriate for bait design within the Adephaga clade (File S2). We conducted seven different tiling design experiments, corresponding to different lengths of bait regions, bait offsets and total number of baits in order to capture as many promising coding exons as possible while accounting for variable exon length, possibly large amount of missing data or hypervariable regions in some parts of the MSAs (see Mayer et al., 2016 for details of the procedure used by Baitfisher, File S1: Table S3). In order to exclude baits targeting multiple genomic regions in adephagan genomes, we filtered the resulting baits (separately for each tiling design experiment) by conducting a blast search against a draft genome assembly of the beetle Bembidion corgenoma (Gustafson et al., 2019, as Bembidion haplogonum, see File S2). We then selected only one bait region per coding exon in each tiling design experiment: the one that required the minimum number of baits (Mayer et al., 2016). Subsequently, for those exons that were captured in multiple tiling design experiments only the longest bait region among experiments was considered (see File S2). In total, we inferred 49 786 120 bp-long bait sequences for targeting 923

protein-coding exons from 651 genes. For the sake of simplicity, we refer to our approach as ‘exon capture’ in this study instead of ‘coding-exon capture’, even though in our procedure we intended to include and analyse only the protein-coding regions of the targeted exons (i.e., excluding 3′ and 5′ untranslated regions).

Tissue preservation, total genomic DNA extraction, next-generation sequencing library preparation and hybrid enrichment of protein-coding exons Most specimens used for hybrid-enrichment of target genomic DNA (gDNA) were freshly collected and preserved in 96% ethanol but we also used a few dry pinned museum specimens (File S1: Table S4). Total gDNA was extracted from 95 specimens of Adephaga (File S1: Tables S2, S4) using the DNeasy Blood & Tissue Kit (Qiagen, Hilden, Germany) and eluted in 100 μL nuclease-free water. Whenever available, voucher material has been deposited at the Zoologische Staatssammlung München (Zoological State Collection in Munich, Germany, see File S1: Table S4). Quality and quantity of the extracted gDNA were assessed with a Fragment Analyzer (Agilent Technologies Inc., Santa Clara, CA, U.S.A.) and a Quantus Fluorometer (Promega, Fitchburg, WI, U.S.A.). Whenever sufficient amount of extracted DNA was available, we used 100 ng of DNA diluted in 10 μL for fragmentation before library preparation, otherwise less than 100 ng were used. First, gDNA was sheared into fragments of 150–400 bp using a Bioruptor Pico sonication device (Diagenode s.a., Seraing, Belgium). Multiple shearing steps were performed for each sample until at least ∼90% of fragments were within the desired length interval. The quality and quantity of the fragmented gDNA were assessed with a Fragment Analyzer at the end of each shearing step. For library preparation, we followed the SureSelectXT2 Target Enrichment System Protocol for Illumina Paired-End Multiplexed Sequencing (Version E1 published in June 2015 by Agilent Technologies Inc.) with some minor modifications (Bank et al., 2017). Specifically, in the library preparation steps ‘End Repair’ and ‘A-tailing’, we reduced the reaction volume specified in Agilent’s protocol (pages 43–49 for 100 ng DNA samples) by 50% as described by Bank et al. (2017). Subsequently, adapter ligation was performed with the NEBNext Quick Ligation Module and the adapters from the NEBNext Multiplex Oligos for Illumina (Dual Index Set1) kit. Next-generation sequencing (NGS) library PCR was then performed with the NEBNext Multiplex Oligos for Illumina and the NEBNext Q5 HotStart HiFi PCR Master Mix, to dual-index the libraries. Cycles of the NGS library PCR were adjusted as follows (due to the concentration measurements after ‘A-tailing’): 98∘ C for 30 s, followed by 8–10 cycles of 98∘ C for 10 s and 65∘ C for 75 s, followed by 5 min at 65∘ C followed by 4∘ C until the samples were removed from the thermocycler. Subsequently, all steps of the hybrid enrichment followed the protocol given by Bank et al. (2017) with modifications adjusted to the number of library pools and volume concentrations in our study (see File S2).

© 2021 The Authors. Systematic Entomology published by John Wiley & Sons Ltd on behalf of Royal Entomological Society. doi: 10.1111/syen.12508

Phylogenomics of Adephaga

Sequencing of enriched genomic libraries (Illumina, NextSeq 500)

Quality-based filtering or raw reads and adapter trimming (Trimmomatic)

5

Calculation of hybrid-enrichment statistics Sequencing, cleaning, assembly, orthology assignment Processing of individual COGs before concatenation

Map baits to filtered assemblies (BWA-mem)

Assembly of sequenced genomic libraries (IDBA-UD)

Map clean pairs of reads to filtered assemblies (BWA-mem)

Cross-contamination check of genomic assemblies (Croco)

Cross-contamination check of assembled transcriptomes (Croco) and vector contamination screening

Calculate average coverage depth of different assembly regions (e.g., target vs. non-target, SAMTools)

Orthology assignment (Orthograph)

Transcriptomes (previously published and newly assembled)

Calculate enrichment statistics (i.e., Ct / Cn, Ct / Ca)

Keep only 651 genes (COGs) for which baits were designed

Remove non-homologous fragments (MACSE, PREQUAL)

Removal of amino-acid residues that do not align to transcriptomes and reference genomes and manual curation of MSAs

Multiple sequence alignment of amino-acid sequences (FSA)

Removal of reference taxa, outlier sequences and masking of randomly similar sections (ALISCORE)

Fig. 1. Summarized workflow of the steps that were followed to sequence, clean, assemble and combine the hybrid-capture sequence data with transcriptomes and to generate individual COGs. A short workflow for calculating the hybrid-enrichment statistics is also provided.

Sequencing and assembly of the enriched genomic libraries The enriched genomic libraries for the 95 samples of Adephaga were paired-end sequenced (150 bp) on a single flow cell of an Illumina NextSeq 500 sequencer (Illumina Inc., San Diego, CA, U.S.A., Fig. 1). Sequenced raw reads of each genomic library were trimmed to remove Illumina adapter sequences and low quality reads with Trimmomatic v. 0.38 (Bolger et al., 2014, see File S2 for options). Only full pairs of trimmed reads were used for de novo assembly of the enriched genomic libraries (File S1: Table S2). De novo assembly of each genomic

library was performed with the software IDBA-UD v. 1.1.3 (see File S2 and Fig. 1) that is optimized to assemble genomic data with highly unequal coverage depth (Peng et al., 2012).

Calculation of hybrid-enrichment statistics We calculated the ratio of average per-base coverage depth of target regions (Ct) divided by the average per-base coverage depth of the nontarget regions (Ct/Cn, File S1: Table S2 and Fig. 2) as an approximate measure of the enrichment

© 2021 The Authors. Systematic Entomology published by John Wiley & Sons Ltd on behalf of Royal Entomological Society. doi: 10.1111/syen.12508

6 A. Vasilikopoulos et al.

200

Ct / Cn ratio

150

100

50

0 Carabidae

Cicindelidae

Dytiscidae

Gyrinidae

Haliplidae

Noteridae

Family

Fig. 2. Box-plots of Ct/Cn ratios inferred separately for each family of Adephaga. The plots were calculated by pooling the ratios for species of the same family into the same box-plot.

success for each genomic library in our analyses. To identify the target regions, we first identified bait-binding regions in each assembled genomic library by mapping the bait nucleotide sequences to the clean assembly files (i.e., after putative cross-contaminated contigs had been removed) using the software BWA-mem v. 0.7.17 (Li & Durbin, 2009). Subsequently, we separately mapped the trimmed reads to the assemblies with the same version of BWA-mem. A summarized file with the coverage depth of each assembly position in each assembly was generated with SAMtools v. 1.7 (Li et al., 2009). We used a custom Python script and the IDs of the contigs that contained orthologous sequence (contigs assigned to any of the 651 target COGs, see below) to calculate the average per-base coverage depth of the bait-binding regions but only on those contigs that contained orthologous sequence (i.e., target regions, Ct, Fig. 1). We subsequently calculated the average per-base coverage depth of all remaining regions in the assembly for each genomic library (i.e., nontarget regions, Cn). Lastly, we calculated the average per-base coverage depth of the whole assembly for each assembled genomic library (Ca). Positions with zero coverage depth were excluded from the above calculations to avoid the inflation of enrichment statistics. We considered the statistics: Ct/Cn and Ct/Ca as approximate measures of the enrichment success for each of the 95 genomic libraries (File

S1: Table S2 and Figs 1, 2). We generated box-plots of these statistics separately for each adephagan family and performed pairwise Mann–Whitney–Wilcoxon tests between families in order to assess whether or not the values for different families were drawn from the same underlying distribution. The pairwise statistical tests were performed in R v. 3.6.3 (File S1: Table S5; R Core Team, 2020).

Cross-contamination checks and orthology assignment Putative cross-contaminated sequences or sequences of ambiguous origin within the assembled sequence-capture data were identified with the software package CroCo v. 1.1 (Simion et al., 2018). CroCo is primarily designed to screen RNA-seq data for contamination but can also potentially identify cross-contaminants from genomic data based on the assumption that the coverage of the contaminated contigs differs between the source library of contamination and the contaminated library respectively (see Simion et al., 2018 and also Mayer et al., 2016 for a similar approach). We considered contigs that were 99% similar over a fragment of 200 nucleotides as suspicious for cross-contamination (option: -tool K and otherwise default options). Contigs that were identified as putative contaminants

© 2021 The Authors. Systematic Entomology published by John Wiley & Sons Ltd on behalf of Royal Entomological Society. doi: 10.1111/syen.12508

Phylogenomics of Adephaga as well as those of ambiguous origin were deleted from the assemblies before downstream analyses (see File S1: Table S6 and File S2 for cross-contamination checks applied for some of the transcriptomes). Orthology assignment of genomic fragments to each of the COGs of the ortholog set was performed with Orthograph v. 0.6.3 (Petersen et al., 2017). From the 3,085 COGs of the ortholog set, we conservatively chose to analyse only the 651 COGs for which we had originally designed baits (File S1: Tables S1, S2). Orthograph-reporter script was run with the ‘protein2dna’ exonerate model for all hybrid-capture data (File S1: Table S2), whereas the default ‘protein2genome’ model was used for all transcriptomes in the dataset (File S1: Table S1, see also File S2 for additional options).

7

masked amino acids with ‘NNN’. We performed additional identification and removal of individual outlier sequences in each aligned aaCOG based on BLOSUM62 expected distances among taxa (see Dietz et al., 2019 and File S2). We then removed all sequences of the reference taxa, except for the sequences of the two hymenopteran species (Harpegnathos saltator, Nasonia vitripennis) and those of Tribolium castaneum that we included as outgroups. Lastly, alignment sections of random similarity within each aaCOG were identified with ALISCORE v. 1.2 (Misof & Misof, 2009; Kück et al., 2010), as described by Vasilikopoulos et al. (2019), and were subsequently removed with ALICUT v. 2.31 (https://github.com/PatrickKueck/AliCUT, access 16 June 2020) both at the amino-acid and the nucleotide sequence levels. The filtered and aligned aaCOGs were finally concatenated into a supermatrix with FASconCAT-G v. 1. 04 (Kück & Longo, 2014).

Data filtering, MSA, outlier-sequence removal and masking of randomly similar sections The output of Orthograph could still possibly contain non-exonic residues due to random extension of open reading frames beyond the protein-coding regions (Bank et al., 2017). Therefore, we followed additional procedures for filtering sequences within each COG. Specifically, we used the software MACSE v. 2.03 (Ranwez et al., 2018, option: -trimNonHomologous) to remove long individual sequence fragments that shared no homology with other sequences in each COG, such as those of possibly unidentified intronic fragments (Ranwez et al., 2018). The software PREQUAL v. 1.02 was subsequently used to remove shorter nonhomologous fragments such as those resulting from assembly artefacts or annotation errors (default parameters, Whelan et al., 2018). These filtering steps were applied at the nucleotide sequence level, and the resulted COGs (amino-acid COGs: aaCOGs, nucleotide COGs: nCOGs) were used for further downstream filtering. We used the software FSA v. 1.15.9 (option: -fast) to infer amino-acid MSAs for each filtered COG (Bradley et al., 2009). We selected the software FSA because it shows higher accuracy (i.e., lower false-positive alignment rate) than other MSA software and tends to leave nonhomologous amino-acid residues unaligned (Bradley et al., 2009). By aligning the amino-acid sequences with FSA, we greatly reduced the possibility of aligning nonhomologous fragments to each other. Subsequently, we filtered the aligned aaCOGs so that amino-acid residues from hybrid-enrichment data that did not align to amino-acid residues of at least one reference species (i.e., official gene set) and at least one transcriptome were masked with an ‘X’. Transcriptomic amino-acid residues that did not align to the protein-coding sequences of at least one reference taxon were also masked with an ‘X’. As a last quality check, we manually curated all aligned aaCOGs to mask putative nonhomologous amino-acid fragments (see File S2). We used these filtered amino-acid alignments as a blueprint to generate corresponding codon-based nucleotide alignments with a modified version of PAL2NAL (Suyama et al., 2006) as described by Misof et al. (2014a). A custom Python script was then used to mask all corresponding codons of the previously

Supermatrix evaluation and optimization for phylogenetic analyses We opted for an informative subset of the above-described amino-acid supermatrix by using the software MARE v. 0.1.2rc and by removing partitions with an information content of zero (IC = 0, Misof et al., 2013). After careful visual inspection of the resulted supermatrix (supermatrix A, Table 1) we observed that it still contained hypervariable alignment blocks. In addition, supermatrix A contained a large proportion of missing data (∼50%, Table 1), which can bias phylogenetic reconstructions if missing characters are not randomly distributed (Lemmon et al., 2009; Misof et al., 2014b). Additionally, supermatrix A showed evidence for deviation from the assumption of stationarity, reversibility and homogeneity (SRH) as measured with the Bowker’s and Stuart’s tests of symmetry in SymTest v. 2.0.47 (Bowker, 1948; Stuart, 1955; Misof et al., 2014a, see Table 1). Therefore, we chose to filter supermatrix A by applying strategies designed to select conserved alignment sites, reduce the degree of missing data and the potential effects of model violations in phylogenetic reconstructions (Misof et al., 2001; Sharma et al., 2014; Laumer et al., 2019). First, we identified and removed individual gene partitions that deviate from model assumptions using the -symtest option in IQ-TREE v. 2.0.4 (Naser-Khdour et al., 2019; Minh et al., 2020). The resulting filtered amino-acid supermatrix was then trimmed with the software BMGE v. 1.12 (h = 0.5, amino-acid replacement matrix: BLOSUM62) to remove hypervariable alignment sites (resulting in supermatrix D). We selected the software BMGE because it selects informative sites by inferring biologically realistic variability for each column of the alignment (Criscuolo & Gribaldo, 2010; Cai et al., 2020). We also generated five additional and independent amino-acid supermatrices by directly trimming supermatrix A or the partitions of supermatrix A with BMGE in order to examine the effects of progressively more aggressive filtering on the phylogenetic results (see Table 1). Additional supermatrices were generated by using three degrees of stringency (h = 0.5, h = 0.4 and h = 0.3, see Table 1 and File S3: Fig. S1).

© 2021 The Authors. Systematic Entomology published by John Wiley & Sons Ltd on behalf of Royal Entomological Society. doi: 10.1111/syen.12508

8 A. Vasilikopoulos et al. Table 1. Summarized statistics and description for each generated and analysed amino-acid supermatrix (see File S3: Fig. S1). Saturation statistics of each supermatrix (adjusted R2 and slope) based on the patristic and p-distances are also presented. Saturation of each supermatrix was also measured with the average pairwise lambda score (see text).

Amino-acid supermatrix ID

No. of species

No. of alignment sites

A B C Da D - recodeda E Fa Ga Ha Ia Ja

136 136 136 136 136 136 136 120 100 136 136

200 017 49 468 55 521 49 797 49 797 50 614 36 511 36 511 36 511 29 361 23 442

P.I. sites 104 221 21 917 26 220 21 401 12 699 21 773 14 143 10 879 9658 11 711 7684

Percent. (%) of P.I. sites

Average pairwise λ (lambda) score

Adjusted R2 (SHETU)

Slope (SHETU)

Adjusted R2 (SHOMU)

Slope (SHOMU)

Adjusted R2 (SHOMP)

Ca

52.1% 44.3% 47.2% 43.0% 25.5% 43.0% 38.7% 29.8% 26.5% 39.9% 32.8%

0.163 0.118 0.135 0.116 0.069 0.116 0.095 0.079 0.074 0.104 0.069

– 0.425 0.369 0.451 – 0.454 0.510 0.396 0.570 0.418 0.556

– 0.126 0.111 0.133 – 0.133 0.155 0.230 0.247 0.135 0.177

– 0.486 0.403 0.512 – 0.515 0.569 0.393 0.575 0.480 0.642

– 0.213 0.182 0.226 – 0.227 0.256 0.272 0.306 0.225 0.299

– 0.479 0.405 N/A – N/A N/A N/A N/A N/A N/A

0.504 0.831 0.790 0.846 0.846 0.846 0.882 0.880 0.892 0.857 0.911

Amino-acid supermatrix ID

Average p-dist

Median pairwise P value to the Bowker’s test

A

0.154

2.14E-02

7.38E-05

0.672

58.92%

82.94%

B

0.111

1.07E-01

1.15E-02

0.620

37.44%

64.07%

C

0.127

9.46E-02

6.73E-03

0.599

40.10%

68.27%

Da

0.109

1.26E-01

1.19E-02

N/A

34.69%

64.11%

D - recodeda E

0.052 0.109

2.16E-01 1.22E-01

– 1.14E-02

N/A N/A

24.67% 35.02%

– 64.19%

Fa

0.089

1.99E-01

4.15E-02

N/A

24.98%

51.94%

Ga

0.074

2.27E-01

6.99E-02

N/A

20.35%

45.27%

Ha

0.070

2.34E-01

8.53E-02

N/A

18.85%

41.92%

Ia

0.098

1.75E-01

4.51E-02

N/A

25.59%

50.94%

Ja

0.065

2.96E-01

1.51E-01

N/A

13.97%

35.21%

Median pairwise P value to the Stuart’s test

IC

Percent. (%) of pairwise P-values < 0.05. Bowker’s test

Percent. (%) of pairwise P-values < 0.05. Stuart’s test

Description Concatenated supermatrix of masked genes with ALISCORE after partitions with IC = 0 had been removed Trimmed each gene partition of supermatrix A with BMGE, BLOSUM62, h = 0.4, keep only genes with length ≥50 amino-acid sites Trimmed each partition of supermatrix A with BMGE, BLOSUM62, h = 0.5, keep only genes with length ≥80 amino-acid sites and ≤30% missing data Removed genes that fail symmetry tests (IQ-TREE) from supermatrix A. Subsequently, trimmed resulting supermatrix with BMGE (h = 0.5, BLOSUM62) Dayhoff-6 recoded version of supermatrix D Trimmed supermatrix A with BMGE, BLOSUM62, h = 0.5 Trimmed supermatrix A with BMGE, BLOSUM62, h = 0.4 Removed distantly related outgroup species from supermatrix F Removed fast evolving ingroup species (20 ingroup species with highest LB scores) from supermatrix G Removed 50% of genes with the highest RCFV value from supermatrix A. Trimmed resulting supermatrix with BMGE, BLOSUM62, h = 0.5 Trimmed supermatrix A with BMGE, BLOSUM62, h = 0.3

a

Analysed under the Bayesian site-heterogeneous model CAT+GTR + G4 (BSHETU). P.I.: parsimony informative, Ca : Overall alignment completeness scores, IC: information content (MARE), p-dist: observed pairwise distances, N/A: Not applicable, SHETU: site-heterogeneous unpartitioned, SHOMU: site-homogeneous unpartitioned, SHOMP: site-homogeneous partitioned.

Among-species compositional heterogeneity is a potential source of systematic error that is frequently associated with fast evolving sites (Foster, 2004; Kocot et al., 2017). We generated two amino-acid supermatrices by using two different approaches for reducing among-species compositional heterogeneity (i.e., Dayhoff6-recoding and removal of genes with high relative composition frequency variation, RCFV, see Table 1 and File S2). We also tested whether the removal of distantly related

outgroup species or the removal of long-branched ingroup taxa (based on long-branch scores, LB scores, see File S2 and File S1: Table S7) affected the phylogenetic relationships. We performed a large number of statistical tests on each generated supermatrix in order to evaluate its suitability for phylogenetic reconstruction (Table 1). First, we inferred substitution saturation plots for most analysed supermatrices (Table 1 and File S2, see Misof et al., 2001; Nosenko et al., 2013) by

© 2021 The Authors. Systematic Entomology published by John Wiley & Sons Ltd on behalf of Royal Entomological Society. doi: 10.1111/syen.12508

Phylogenomics of Adephaga calculating pairwise amino-acid p-distances and pairwise patristic distances. We also inferred an alternative measure of substitution saturation that is independent of the patristic distances of the inferred trees; the average lambda score for each supermatrix (i.e., λ, ranging from 0.0 to 1.0) that was recently introduced for pairs of aligned sequenced data (higher values indicate higher degree of saturation, Jermiin & Misof, 2020). All pairwise λ scores in each supermatrix were calculated with the software SatuRation v. 1.0 (available from: https://github.com/lsjermiin/ SatuRation.v1.0, last access: 5 January 2021, Jermiin & Misof, 2020). We also measured the overall deviation from SRH conditions with the software SymTest v. 2.0.47 (current version available at https://github.com/ottmi/symtest, last access 20 April 2020, see Misof et al., 2014a) for each filtered supermatrix and for the original supermatrix A by applying the Bowker’s and Stuart’s tests of symmetry (Table 1). Additionally, we calculated the overall completeness scores of the analysed supermatrices and generated heatmaps of pairwise completeness scores with AliStat v. 1.11 (Wong et al., 2020, Table 1). Lastly, we screened each generated supermatrix for taxa with heterogeneous sequence divergence by generating heatmaps of pairwise mean similarity scores with ALIGROOVE v. 1.06 (Kück et al., 2014).

Concatenation-based phylogenetic analyses of amino-acid supermatrices Modelling site-specific propensities of amino acids has been shown to be more important than modelling partition-wise heterotachy in concatenation-based phylogenomic analyses (Feuda et al., 2017; Wang et al., 2019). In order to account for site-specific amino-acid preferences in the supermatrices, we analysed most amino-acid supermatrices under the site-heterogeneous model CAT + GTR + G4 (Bayesian site-heterogeneous model, BSHETU) using the software Phylobayes MPI v. 1.8 (Table 1, Lartillot et al., 2013). Two independent MCMC chains were run for each dataset until more than 20 000 samples were collected or until convergence (File S1: Table S8). We also analysed the amino-acid supermatrices using a maximum likelihood approach (ML) with IQ-TREE v. 1.6.12 (Nguyen et al., 2015). We first selected the best-fitting substitution models in ModelFinder based on the AICc criterion on the unpartitioned matrices (File S1: Table S9; Akaike, 1974; Kalyaanamoorthy et al., 2017). In order to test the relative fit of site-heterogeneous versus site-homogeneous models, we also included empirical site-heterogeneous profile mixture models in our model-selection procedure (i.e., C20, C40, C60, Quang et al., 2008). In total, more than 270 models were tested on each of supermatrices B–J (unpartitioned data) except for the recoded dataset, which was only analysed with the BSHETU model. For the partitioned supermatrices (B, C, Table 1), we also calculated an optimal partitioning scheme using an edge-linked partition model using the same version of IQ-TREE (File S2, Chernomor et al., 2016; Lanfear et al., 2014). For these supermatrices, we assessed the relative model fit of site-homogeneous unpartitioned (SHOMU), site-homogeneous partitioned (SHOMP) and

9

site-heterogeneous unpartitioned (SHETU) models by using a fixed neighbour-joining tree (File S1: Table S10 and File S2). Phylogenetic tree inference was performed for each supermatrix with the SHOMU, SHETU, PMSF (posterior mean-site frequency profile model as an approximation to the C60 SHETU model, File S2, Wang et al., 2018) and SHOMP models (where applicable). This was done in order to explore the extent to which using a suboptimal model affected phylogenetic reconstructions (File S1: Tables S9, S10). Statistical branch support of the inferred relationships in all concatenation-based ML analyses was estimated based on 2,000 ultrafast bootstrap (UFB) replicates (Hoang et al., 2018). As a complementary measure of support, we inferred quartet-concordance scores (QC) with Quartet Sampling v. 1.3.1 (Pease et al., 2018, option: -nreps 150) on the tree that resulted from the SHETU-based analysis of supermatrix D (Fig. 3). For inferring QC, we used a site-heterogeneous but less complex model than the one used to infer the tree (i.e., JTT + C20 + F + R8 instead of JTT + C60 + F + R8 due to computational limitations and using the same version of IQ-TREE, File S3: Fig. S2). Lastly, we calculated pairwise RF distances among the inferred trees under the same model (SHOMU, SHETU and PMSF) for amino-acid datasets with full taxon sampling using ETE v. 3.1.1 (Huerta-Cepas et al., 2016).

Phylogenetic analyses of nucleotide sequence data To assess the stability of phylogenetic results among analyses of different types of data, we also generated and analysed four supermatrices at the nucleotide sequence level (File S1: Table S11). Analyses of these supermatrices were performed with the same version of IQ-TREE and by selecting best-fitting SHOMP and SHOMU models (see File S2 and File S1: Table S11). We also inferred phylogenetic relationships using a model that accounts for heterotachy among sequences but has only been extensively tested in analyses of nucleotide sequence data (see File S1: Table S11, Crotty et al., 2020).

Estimating alternative and confounding signals in supermatrices via four-cluster likelihood mapping and data permutations In addition to the quartet-concordance measure, we applied the four-cluster likelihood mapping approach (FcLM, Strimmer & von Haeseler, 1997) to assess the robustness of phylogenetic results, and to measure the strength of alternative phylogenetic signals with respect to specific phylogenetic hypotheses that resulted from the analyses of supermatrix D (Fig. 3 and Table 2). The hypotheses that we tested were the following: (i) Hygrobiidae is sister to a clade of Amphizoidae + Aspidytidae (hypothesis 1) and (ii) Cicindelidae is the sister group of Carabidae (hypothesis 2). FcLM analyses were performed on different amino-acid supermatrices that were trimmed with different degrees of stringency and were based on both SHETU and SHOMU models, in an attempt to assess whether model misspecification affected the phylogenetic signal in favour

© 2021 The Authors. Systematic Entomology published by John Wiley & Sons Ltd on behalf of Royal Entomological Society. doi: 10.1111/syen.12508

10

A. Vasilikopoulos et al. Harpegnathos saltator Nasonia vitripennis Fibla maclachlani Puncha ratzeburgi Protohermes xanthodes Pseudimares aphrodite Stylops melittae Xenos vesparum Agrilus planipennis Melanotus villosus Aleochara curtula Anomala sp. Oxoplatypus quadridentatus

0.08 8

Gyrinidae Dineutus

Trachypachidae Trachypachus

Cicindelidae

Tetracha Ozaena

Carabidae

Cychrus Panagaeus

Haliplidae

Haliplus Neohydrocoptuss

Noteridae Hygrobiidae Aspidytidae Amphizoidae

Hygrobia Amphizoa

Lancetes Megadytes

Dytiscidae

"HYDRADEPHAGA"

100% 95–99% 85–94% 75–84% 65–74%

Outgroups

GEADEPHAGA

Ultrafast bootstrap support

Tribolium castaneum Hydroscapha redfordi Lepicerus sp. Micromalthus debilis Priacma serrata Gyrinus marinus Gyrinini Andogyrus sp. Dineutini Macrogyrus sp. Patrus sp. Orectochilini Dineutus sp. Dineutini Porrorhynchus sp. Trachypachus gibbsii Amblycheila cylindriformis Manticorini Manticora latipennis Tetracha carolina Megacephalini Tricondyla aptera Collyridini Cicindela hybrida Cicindelini Mesacanthina cribata Pseudoxicheila tarsalis Oxycheilini Therates labiatus Cicindelini Clinidium baldufi Rhysodinae Goniotropis sp. Paussinae Ozaena sp. Siagona sp. Siagoninae Cychrus sp. Calosoma frigidum Carabinae Carabus granulatus Nebria picicornis Nebriinae Notiophilus sp. Broscinae Broscus cephalotes Elaphrus aureus Elaphrinae Loricera pilicornis Loricerinae Omophron sp. Omophroninae Clivina sp. Scaritinae Scarites subterraneus Bembidion corgenoma Trechinae Pogonus chalceus Pheropsophus sp. Brachininae Galerita sp. Morion sp. Calathus sp. Pterostichus burmeisteri Chlaenius tricolor Panagaeus bipustulatus Notiobia sp. Glyptolenus sp. Harpalinae Platynus sp. Odacantha melanura Pinacodera sp. Lachnophorini sp. Adelotopus paroensis Sphallomorpha suturalis Calophaena bicincta Peltodytes (Peltodytes) caesus Peltodytes (Neopeltodytes) oppositus Brychius elevatus Algophilus lathridioides Haliplus (Neohaliplus) lineatocollis Haliplus (Haliplidius) confinis Haliplus (Haliplus) fluviatilis Haliplus (Liaphlus) laminatus Notomicrus sp. Neohydrocoptus sp. Noterus clavicornis Canthydrus sp. Hydrocanthus oblongus Mesonoterus laevicollis Suphisellus (Pronoterus) semipunctatus Suphisellus gibbulus Suphisellus tenuicornis Hygrobia hermanni Hygrobia nigra Aspidytes niobe Sinaspidytes wrasei Amphizoa insolens Amphizoa lecontei Coptotomus sp. Coptotominae Lancetes sp. Lancetinae Agabetes acuductus Philaccolilus sp. . Neptosternus sp. Laccophilinae Laccodytes sp. Laccophilus poecilus Ilybius fenestratus Platambus maculatus Agabus undulatus Agabinae Hydrotrupes palpalis Platynectes sp. Bunites distigma Caperhantus cicurius Colymbetinae Meridiorhantus calidus Cybister lateralimarginalis Megadytes sp. Cybistrinae Sternhydrus atratus Sternhydrus scutellaris Dytiscus marginalis Hyderodes shuckardi Hydaticus pacificus Eretes griseus Dytiscinae Acilius canaliculatus Thermonectus basillaris Thermonectus intermedius Thermonectus margineguttatus Copelatus caelatipennis Exocelina sp. Copelatinae Liopterus haemorrhoidalis Batrachomatus nannup Matinae Matus sp. Hydrodytes opalinus Hydrodytinae Celina imitatrix Laccornis oblongus Hydrovatus sp. Pachydrus sp. Bidessus unistriatus Hydroglyphus geminus Derovatellus peruanus Necterosoma penicillatum Graptodytes pictus Porhydrus lineatus Hyphydrus ovatus Hydroporus erythrocephalus Stictotarsus duodecimpustulatus Hygrotus (Leptolambus) impressopunctatus

Laccornis Neptosternus

Hydroporinae

Hyphydrus Acilius

Fig. 3. Phylogenetic relationships of Adephaga as they resulted from the analysis of supermatrix D under the JTT + C60 + F + R8 site-heterogeneous model (i.e., SHETU model). Circles on tree nodes indicate branch support based on 2,000 ultrafast bootstraps (UFB). All beetle photos were provided by M. Balke. © 2021 The Authors. Systematic Entomology published by John Wiley & Sons Ltd on behalf of Royal Entomological Society. doi: 10.1111/syen.12508

96.20% 95.90% 88.80% 17.20%

97.70% 96.20% 89.50% 9.60%

(Hyg. + Dyt.), (Rem. + Amp. + Asp.) 43.60% 48.00% 34.50% 2.40% (Tr. + Cara.), (Cici. + Rem.) 28.60% 24.20% 22.00% 3.50% (Hyg. + Rem.), (Dyt. + Amp. + Asp.) 43.40% 39.40% 18.40% 3.90% (Tr. + Cici.), (Cara. + Rem.) 35.10% 42.60% 38.00% 11.20% 95.00% 94.90% 92.00% 85.90%

96.70% 96.70% 94.40% 90.60%

(Hyg. + Amp. + Asp.), (Dyt. + Rem.) 10.70% 8.80% 36.60% 3.30% (Cici. + Cara.), (Tr. + Rem.) 32.50% 29.10% 28.80% 2.50% (Hyg. + Dyt.), (Rem. + Amp. + Asp.) 8.10% 8.30% 11.70% 10.60% (Tr. + Cara.), (Cici. + Rem.) 12.60% 12.80% 16.60% 21.90% (Hyg. + Rem.), (Dyt. + Amp. + Asp.) 28.80% 29.60% 29.00% 36.00% (Tr. + Cici.), (Cara. + Rem.) 5.90% 6.20% 7.30% 13.80% Supermatrix D Supermatrix E Supermatrix F Supermatrix J Hypo2 (30 912 quartets) Supermatrix D Supermatrix E Supermatrix F Supermatrix J

(Hyg. + Amp. + Asp.), (Dyt. + Rem.) 59.80% 58.80% 53.70% 44.00% (Cici. + Cara.), (Tr. + Rem.) 76.50% 75.90% 68.10% 50.20% Hypo1 (25 296 quartets)

Note: for the FcLM analyses we only included supermatrices that are comparable with respect to the effects of data trimming because they resulted from direct trimming of supermatrix A. Supermatrix D resulted from trimming a slightly different version of supermatrix A from which only 12 genes had been removed. Amp.: Amphizoidae, Asp.: Aspidytidae, Hyg.: Hygrobiidae, Dyt.: Dytiscidae, Rem.: remaining species, Cici.: Cicindelidae, Cara.: Carabidae, Tr.: Trachypachidae, SHETU: site-heterogeneous unpartitioned, SHOMU: site-homogeneous unpartitioned.

97.70% 97.70% 96.90% 94.70%

(Hyg. + Dyt.), (Rem. + Amp. + Asp.) 6.30% 6.60% 8.60% 9.10% (Tr. + Cara.), (Cici. + Rem.) 24.80% 25.10% 29.90% 35.00% (Hyg. + Amp. + Asp.), (Dyt. + Rem.) 65.30% 64.40% 61.50% 51.00% (Cici. + Cara.), (Tr. + Rem.) 67.20% 66.80% 60.90% 48.30%

(Hyg. + Rem.), (Dyt. + Amp. + Asp.) 27.30% 27.90% 28.20% 36.60% (Tr. + Cici.), (Cara. + Rem.) 5.70% 5.80% 6.10% 11.40%

98.90% 98.90% 98.30% 96.70%

Total resolved quartets (%) Alternative topology 2 supermatrix D Alternative topology 1 supermatrix D Total resolved Given topology quartets (%) supermatrix D Given topology supermatrix D

Alternative topology 1 supermatrix D

Alternative topology 2 supermatrix D

Total resolved Given topology quartets (%) supermatrix D

Alternative topology 1 supermatrix D

Alternative topology 2 supermatrix D

SHOMU model (original data) SHETU model (permuted data) SHETU model (original data)

Table 2. Detailed results of the four-cluster likelihood mapping analyses (FcLM) for the two examined phylogenetic hypotheses. Results (i.e., percentages) are shown only for the fully resolved quartets (i.e., quartets falling within the corner areas of the triangular Voronoi diagrams, see Strimmer & von Haeseler, 1997).

Phylogenomics of Adephaga 11 of specific hypotheses (Table 2). In addition, FcLM analyses under the better-fitting SHETU models were performed with permutations of data (i.e., randomization of phylogenetic signal, permutation no. I, see Misof et al., 2014a), in order to assess whether or not the FcLM support for a particular inferred relationship under the SHETU models resulted from misleading signal (Table 2, Misof et al., 2014a).

Summary coalescent phylogenetic analyses To explore the sensitivity of our concatenation-based analyses to the putative effects of incomplete lineage sorting (ILS), we conducted summary coalescent phylogenetic analyses (SCAs) with ASTRAL III v. 5.7.3 (Zhang et al., 2018a). As SCAs are prone to gene-tree estimation errors (Mirarab et al., 2016; Sayyari et al., 2017) we took steps to reduce these effects on our analyses. Alignment trimming methods have been shown to be detrimental in phylogenetic inference of gene trees (Tan et al., 2015) and therefore we selected the unmasked amino-acid alignments for these analyses (before trimming with ALISCORE, Fig. 1, File S3: Fig. S1). However, in order to reduce the negative effects of fragmentary sequences (Sayyari et al., 2017), which are common for hybrid-capture data (Hosner et al., 2016), we (i) removed alignment sites with more than or equal to 50% ambiguous characters, and then (ii) removed sequences for which more than 75% of sequence length contained ambiguous characters. Finally, we kept only genes that had a length of at least 150 amino acids and less than 50% missing data. The filtering tasks were performed with custom Perl scripts. In total, 348 filtered gene alignments were used for SCA. Gene trees were inferred after selecting the best-fitting models (SHOMU models) with the same version of IQ-TREE (see File S2). Branch support of individual gene trees was calculated based on 10 000 SH-aLRT replicates (Guindon et al., 2010; Simmons & Kessenich, 2020). SCAs were then conducted with ASTRAL after collapsing weakly supported branches ( 0.30), whereas analyses of supermatrix F have reached the convergence value of maxdiff. = 0.307 (considered acceptable in our study). N.I.: not inferred, N.A.: not applicable, SHETU: site-heterogeneous unpartitioned, SHOMP: site-homogeneous partitioned, SHOMU: site-homogeneous unpartitioned, BSHETU: Bayesian CAT+ GTR + G4 model.

(Fig. 3 and File S3: Figs S20–S24). Within Geadephaga, the monophyly of tiger beetles (Cicindelidae) and their placement as sister to monophyletic ground beetles (Carabidae) were inferred in most analyses under the site-heterogeneous models (BSHETU, SHETU, PMSF, Table 3; File S3: Figs S5–S19, S34–S42) and was also supported by analyses of nucleotide sequence data (File S3: Figs S20–S24). In contrast, a clade Trachypachidae + Carabidae was only obtained in analyses of supermatrix J and only under conditions of model misspecification (i.e., SHOMU model) or under the PMSF approximation, yet with no strong statistical branch support (Table 3; File S3: Figs S33, S42). Concerning the inferred position of the family Hygrobiidae, all ML analyses under the better-fitting SHETU models supported a clade of Hygrobiidae + (Amphizoidae + Aspidytidae) and most of them with strong UFB support (e.g., Fig. 3). QC score also strongly supported this clade (File S3: Fig. S2, QC = 0.33). UFB support in favour this clade under SHETU models was lower when more stringent trimming criteria were applied, but the inference of this clade remained robust to the selection of dataset when a SHETU model was applied (Table 3). On the other hand, analyses under the SHOMU and SHOMP models were inconsistent regarding this hypothesis (Table 3). Specifically, SHOMU analyses of the most stringently trimmed supermatrix under full taxon sampling (supermatrix J) supported a clade Dytiscidae + (Amphizoidae + Aspidytidae) as sister to Hygrobiidae but not with strong statistical branch support (Table 3). In general, progressive trimming with more stringent criteria resulted in shift from a strongly or moderately supported Hygrobiidae + (Amphizoidae + Aspidytidae) clade (supermatrix D and E) to a poorly supported Dytiscidae + (Amphizoidae + Aspidytidae) clade (supermatrix F and J) but only in conditions of model misspecification (SHOMU models). This pattern is also observed under BSHETU model but only

for the most stringently trimmed supermatrix (supermatrix J, Table 3). Phylogenetic analyses with the PMSF approximation to the SHETU model (using a SHOMU-based guide tree with a different topology, Table 3) restored the monophyly of Hygrobiidae + (Amphizoidae + Aspidytidae) for most supermatrices (e.g., supermatrices F, G, I, J, but not for supermatrix C). This suggests that the clade Dytiscidae + (Amphizoidae + Aspidytidae) inferred under SHOMU models for these supermatrices is likely an artefact due to model misspecification. Overall, a clade that includes Amphizoidae, Aspidytidae and Dytiscidae is never strongly supported even in the few instances that it is inferred under a site-heterogeneous model (BSHETU or PMSF, Table 3; File S3: Figs S16, S18, S19, S35, S43). Additional support for a clade Hygrobiidae + (Amphizoidae + Aspidytidae) comes from the results after removing distant outgroups and long-branched ingroup taxa from supermatrix F. Specifically, removing only distantly related outgroup taxa did not result in strong UFB support for this clade under the SHETU model (93%, Table 3) but when long-branched ingroup taxa were also removed, UFB support for the above-mentioned clade increased under the same model (98%). Additionally, the topology flipped from the clade Dytiscidae + (Amphizoidae + Aspidytidae) to the clade Hygrobiidae + (Amphizoidae + Aspidytidae) under the SHOMU and BSHETU models when long-branched ingroup species were removed (although not with strong support under the BSHETU, Table 3). This suggests that removal of distant outgroups without also accounting for branch-length heterogeneity of the ingroup might result in erroneous topology even when a site-heterogeneous model is used. Phylogenetic analyses of the Dayhoff6-recoded matrix D recovered unexpected and poorly supported clades with respect to the internal phylogeny of Dytiscoidea and more generally Adephaga (e.g., Gyrinidae + Geadephaga and Amphizoidae + Dytiscidae with low support, File S3: Fig. S14). Although the BSHETU

© 2021 The Authors. Systematic Entomology published by John Wiley & Sons Ltd on behalf of Royal Entomological Society. doi: 10.1111/syen.12508

Phylogenomics of Adephaga 15 analyses of the recoded matrix did not reach convergence (File S3: Fig. S14 and File S1: Table S8, maxdiff = 0.49, more than 29 000 samples per MCMC chain), these observations suggest that amino-acid data-recoding might be detrimental when excessive alignment trimming and data filtering have been applied before recoding the data.

Internal phylogeny of Carabidae, Cicindelidae, Dytiscidae and Gyrinidae based on analyses of concatenated sequence data Analyses of amino-acid and nucleotide supermatrices in a concatenation framework yielded the monophyly of all subfamilies in Dytiscidae (Fig. 3). However, phylogenetic relationships among these dytiscid subunits were unstable and not consistently resolved in all analyses except for a few cases. For instance, the subfamily Hydrodytinae was always inferred as sister to Hydroporinae with strong UFB and QC support (Fig. 3 and File S3: Fig. S2). The subfamilies Coptotominae and Lancetinae were always inferred as sister groups (Fig. 3; File S3: Figs S5–S19). In addition, all concatenation-based analyses resulted in a clade that includes all subfamilies of Dytiscidae excluding Coptotominae, Laccophilinae and Lancetinae with strong UFB and QC support (Fig. 3 and File S3: Figs S2, S5–S44). Specifically, most analyses with the SHETU models recovered Lancetinae + Coptotominae as sister to Laccophilinae + remaining Dytiscidae (Fig. 3; File S3: Figs S5–S12). In addition, most analyses of amino-acid supermatrices suggested the placement of Copelatinae as sister to a clade Matinae + (Hydrodytinae + Hydroporinae) (Fig. 3; File S3: Figs S5–S19, S25–S44). Lastly, the clades Agabinae + Colymbetinae and Cybistrinae + Dytiscinae were inferred consistently with strong support (Fig. 3; File S3: Figs S5–S44). Concerning the internal phylogeny of Cicindelidae, the tribe Manticorini was inferred as sister to all other subfamilies in all concatenation-based analyses (Fig. 3; File S3: Figs S5–S44). This result received high QC or high UFB support across concatenation-based analyses (File S3: Figs S2, S5–S44). Although a paraphyletic Manticorini was inferred in a few instances, this result was likely an artefact due to the extremely high degree of missing data for the species Manticora latipennis (File S1: Table S2). The tribe Megacephalini was placed as sister to all remaining Cicindelidae except Manticorini, whereas the tribe Collyridini was inferred as sister to a clade that included Cicindelini and Oxycheilini (Fig. 3 and File S3: Figs S5–S19, S25–S44). In contrast to Cicindelidae, the internal phylogeny of the megadiverse Carabidae remained largely unstable across analyses of different supermatrices and models (File S3: Figs S5–S44). However, some relationships were robustly inferred. For instance, the subfamily Trechinae was always inferred as sister to Brachininae + monophyletic Harpalinae, whereas the subfamilies Paussinae, Rhysodinae and Siagoninae were placed in a monophyletic group close to the base of the tree of Carabidae in analyses of amino-acid supermatrices (Fig. 3 and File S3: Figs S5–S19, S25–S44). Lastly, Carabinae was inferred as

sister to Nebriinae in most phylogenetic analyses of amino-acid supermatrices (Fig. 3 and File S3: Figs S5–S19, S25–S44). Within Gyrinidae, a strongly supported clade Dineutini + Orectochilini (as sister to Gyrinini) was inferred in different concatenation analyses of different types of data and models (Fig. 3 and File S3: Figs S5–S44). Dineutini was inferred as paraphyletic with respect to Orectochilini in analyses of amino-acid supermatrices but not always with strong UFB support (Fig. 3). Additionally, the inferred QC score did not support a paraphyletic Dineutini (QC = −0.1, File S3: Fig. S2). Analyses of nucleotide sequence data mostly suggested a monophyletic Dineutini as sister to Orectochilini, but monophyly of Dineutini was only strongly supported in one analysis of nucleotide sequence data (supermatrix D_nt, File S1: Table S11 and File S3: Fig. S24).

Comparison of different schemes of evolutionary modelling and the predictability of substitution saturation In total, 277 models were tested on each unpartitioned amino-acid supermatrix with ModelFinder. The results show that SHETU models significantly outperformed the best SHOMU models for all supermatrices in an unpartitioned context (File S1: Table S9). All the best-fitting SHETU models included 60 categories of fixed empirical amino-acid frequencies (i.e., C60 site-heterogeneous models) suggesting that the most complex SHETU models fitted the data better even for the most stringently trimmed supermatrices (e.g., supermatrices F and J, File S1: Table S9). Comparison of the optimal partitioning schemes (SHOMP) for supermatrices B and C with the complex SHETU models showed that site-heterogeneous models (SHETU) fitted these datasets better than both partitioned and unpartitioned site-homogeneous models (SHOMP and SHOMU, File S1: Table S10). Based on the observation that SHETU models fit the data better, the inferred saturation statistics showed that using a site-homogeneous model (SHOMP or SHOMU) resulted in underestimation of the amount of substitution saturation in the amino-acid supermatrices when a measure that is dependent on patristic distances was used (i.e., adjusted R2 , Table 1).

Stability of inferred relationships of Adephaga across analyses with different evolutionary models We calculated all pairwise normalized RF distances among trees inferred under the same model (SHOMU, SHETU or PMSF) for those amino-acid datasets with full taxon sampling (seven trees per model, supermatrices B, C, D, E, F, I, J, Fig. 5). We assessed whether topological distances between inferred trees differ when using different evolutionary models. Although RF distances of inferred trees did not significantly differ between PMSF and SHOMU models (P value = 0.237, Mann–Whitney–Wilcoxon test with continuity correction) or between PMSF and SHETU models (P value = 0.136, Mann–Whitney–Wilcoxon test with continuity correction),

© 2021 The Authors. Systematic Entomology published by John Wiley & Sons Ltd on behalf of Royal Entomological Society. doi: 10.1111/syen.12508

16

A. Vasilikopoulos et al.

Pairwise norm. RF distance

0.15

0.10

0.05

0.00 PMSF

SHETU

SHOMU

Model

Fig. 5. Box-plots of all pairwise normalized Robinson–Foulds distances among trees that were inferred from different amino-acid supermatrices under the same type of model (normalized distances, only maximum likelihood analyses). We only included distances among trees that were inferred with full taxon sampling (i.e., supermatrices: B, C, D, E, F, I, J). SHETU: site-heterogeneous unpartitioned model, PMSF: posterior mean-site frequency profile model, SHOMU: site-homogeneous unpartitioned model.

RF distances of inferred trees were lower in analyses of SHETU models when compared with the SHOMU models (P value = 0.013, Mann–Whitney–Wilcoxon test with continuity correction, Fig. 5). This result is congruent with the consistent inference of the clade Hygrobiidae + (Amphizoidae + Aspidytidae) under SHETU models that were instead not consistently inferred under the SHOMU models, and constitutes further evidence that full site-heterogeneous empirical mixture models (C60, ML-based) result in greater stability of the inferred relationships than the less complex SHOMU models (Table 3 and Fig. 5).

Effects of removing hypervariable sites, distantly related outgroups and long-branched taxa on the statistical properties of amino-acid supermatrices Removal of hypervariable sites had a positive impact on the statistical properties of amino-acid supermatrices in terms of eliminating potential confounding factors (Table 1). In particular, trimming the supermatrices with BMGE resulted in reduction of total and pairwise missing data (Table 1 and File S3: Figs S45–S54) and reduced deviation from SRH conditions as indicated by the reduced percentage of pairwise comparisons that failed the corresponding symmetry tests in the analysed supermatrices (Table 1, Bowker’s test: 35.02%, 24.98% and 13.97% failed tests in supermatrices E, F and J respectively, see File S3: Figs S55–S65). Additionally, progressive removal of hypervariable sites resulted in progressively increasing

completeness of the supermatrices (Ca scores: 0.846, 0.882 and 0.911 for supermatrices E, F and J respectively, Table 1, File S3: Figs S45–S54). Supermatrices D and E did not significantly differ when comparing their statistical properties because only 12 genes from supermatrix A failed the symmetry tests in IQ-TREE and had therefore been removed before trimming (Table 1). Pairwise alignment similarity scores of taxa and indices for substitution saturation also improved with BMGE trimming (supermatrices D, E, F and J, Table 1 and File S3: Figs S66–S95), suggesting that progressively removing hypervariable sites results in progressively less saturated supermatrices (supermatrices D, E, F and J). The average λ scores within each supermatrix also showed that progressive removal of hypervariable sites resulted in supermatrices with less decay of potential historical signal (i.e., lower average λ scores, supermatrices D, E, F and J in Table 1). On the other hand, progressively more aggressive trimming of hypervariable sites resulted in progressive reduction of total parsimony-informative sites and reduced percentage of parsimony-informative sites (from 43.00% in supermatrix E to 32.80% in supermatrix J, Table 1). In a similar fashion, Dayhoff6-recoding resulted in removal of 40.66% of parsimony-informative sites from supermatrix D (Table 1). Removal of distantly related outgroups from supermatrix F resulted in a less saturated supermatrix according to average λ score, whereas the linear regression of p- and patristic distances under the SHOMU and SHETU models showed reduced adjusted R2 value (i.e., suggesting higher saturation) compared with the dataset before removing distantly related outgroups (i.e., supermatrix F). Comparisons of saturation statistics among datasets and models showed that conventional statistics of substitution saturation (R2 and slope of regression) are dependent on the applied model (Table 1). Despite this, removal of distantly related outgroups from supermatrix F resulted in reduced proportion of failed pairwise symmetry tests (Bowker’s test: 24.98%, 20.35%, 18.85% failed tests in supermatrices F, G, H respectively). Removal of long-branched ingroup taxa (see File S1: Table S7) resulted in further decrease in potential deviations from SRH conditions and also in further reduction in the degree of saturation (Bowker’s test: 24.98%, 20.35%, 18.85% failed tests, λ scores: 0.095, 0.079, 0.074 in supermatrices F, G, H, respectively).

Effects of removing hypervariable sites on the branch support statistics of well-established adephagan relationships We examined how removing hypervariable sites with BMGE using different degrees of stringency affected phylogenetic branch support for previously well-established clades of Adephaga and their outgroups. A clade that includes all adephagan families except Gyrinidae was strongly supported when using a moderate trimming strategy (supermatrices D, E, Fig. 6B) but UFB support for this relationship decreased with more aggressive trimming of the data under the SHETU and PMSF models (SHETU: 93% and 87% support in supermatrices F and J respectively, Fig. 6B). This pattern is also observed under the

© 2021 The Authors. Systematic Entomology published by John Wiley & Sons Ltd on behalf of Royal Entomological Society. doi: 10.1111/syen.12508

Phylogenomics of Adephaga 17

(A)

Percentage of branches (%)

30

20

Branches with < 100% support Branches with < 95% support

10

0 Supermatrix D

Supermatrix E

Supermatrix F

Supermatrix J

Dataset

(B) Adephaga excluding Gyrinidae U

OM

SH

SH

ET

U PM

SF

BS

HE

Haliplidae + Dytiscoidea

Coleoptera

TU

U

OM SH

SH

ET

U PM

SF

HE BS

TU

U

OM

SH

Supermatrix D

Supermatrix D

Supermatrix D

Supermatrix E

Supermatrix E

Supermatrix E

Supermatrix F

Supermatrix F

Supermatrix F

Supermatrix J

Supermatrix J

Supermatrix J

Branch support (UFB or PP)

SH

ET

U PM

SF

BS

HE

TU

Aspidytidae

Coleopterida

100 or 1.00

U

95–99 or 0.95–0.99

OM

SH

90–94 or 0.90–0.94

SH

ET

U

SF

PM

HE BS

TU

U

OM

SH

SH

ET

U PM

SF

T HE BS

U

80–89 or 0.80–0.89 50–79 or 0.50–0.79

Supermatrix D

Supermatrix D

Supermatrix E

Supermatrix E

Supermatrix F

Supermatrix F

Supermatrix J

Supermatrix J

Not analyzed Not inferred

Fig. 6. Effects of removing hypervariable sites on the branch support statistics when different degrees of trimming stringency were applied (i.e., h = 0.5 for supermatrices D and E, h = 0.4 for supermatrix F, h = 0.3 for supermatrix J). For these comparisons we only included supermatrices that are directly comparable because they resulted from direct trimming of supermatrix A. Supermatrix D resulted from trimming a slightly different version of supermatrix A from which only 12 genes had been removed. (A) Percentage of branches with UFB support lower than 100% (red bars) and lower than 95% (blue bars) under the SHETU model in analyses of selected amino-acid supermatrices. (B) Branch support (UFB or posterior probability) for specific well-established phylogenetic clades of Adephaga and outgroups (based on morphology and other molecular phylogenetic studies) depending on the dataset that was analysed. SHOMU: site-homogeneous unpartitioned, SHETU: site-heterogeneous unpartitioned, PMSF: posterior mean-site frequency profile model, BSHETU: Bayesian site-heterogeneous CAT+GTR + G4 model. The BSHETU analyses of supermatrix D did not reach convergence (maxdiff. = 1), whereas BSHETU analyses of supermatrix F have reached the value of maxdiff. = 0.307 (considered here as marginally acceptable, see File S1: Table S8). BSHETU analyses of supermatrix J have also reached the acceptable convergence value maxdiff. = 0.293. © 2021 The Authors. Systematic Entomology published by John Wiley & Sons Ltd on behalf of Royal Entomological Society. doi: 10.1111/syen.12508

18

A. Vasilikopoulos et al.

complex BSHETU model (0.94 and 0.78 posterior probability in supermatrices F and J, respectively, Fig. 6B), whereas analyses under a mis-specified model (SHOMU) still gave strong support for this relationship (98% in supermatrix J). A similar pattern is observed for the monophyly of a clade Haliplidae + Dytiscoidea, which is inferred under all models but receives lower support in the analyses of supermatrices that were trimmed more aggressively (99% and 92% UFB support in supermatrices F and J under the SHETU model respectively, Fig. 6B). Additionally, excessive trimming of the supermatrix A resulted in very low UFB support for the monophyly of Coleoptera under the better-fitting SHETU model and even resulted in nonmonophyletic beetles in cases of model misspecification (Fig. 6B, supermatrix F). The monophyly of Aspidytidae is also less well-supported in the analyses of supermatrices that were produced by very stringent trimming (supermatrices F and J, 81% and 99% respectively under the SHETU model, Fig. 6B). Lastly, trimming of the data with progressively more stringent criteria resulted in the increase of the proportion of clades that are poorly supported under the better-fitting SHETU models (total proportion of branches with 20 000 amino-acid sites) to be considered phylogenomic datasets, yet the proportion of well-supported clades in their inferred trees is drastically reduced in comparison to less stringently trimmed datasets. These observations suggest that a balance between removing data-driven bias and phylogenetic information should be pursued in phylogenomic analyses.

Site-heterogeneous models result in greater stability of phylogenetic relationships of Adephaga Models that account for site-specific amino-acid propensities in the supermatrices, by incorporating heterogeneity in the amino-acid equilibrium frequencies among sites, have been shown to provide a better fit to the data than site-homogeneous models (partitioned or unpartitioned, Feuda et al., 2017). Our analyses confirm these results although our model selection procedure was not performed in a Bayesian framework to include the most complex site-heterogeneous models (i.e., CAT, Lartillot & Philippe, 2004). Nevertheless, recent research shows that when the number of amino-acid equilibrium frequency categories is fixed (e.g., C60 models), the model can potentially describe heterogeneous processes in the data as well as the unconstrained CAT model (Li et al., 2021). Therefore, the use of an unconstrained number of amino-acid equilibrium frequency categories in phylogenetic analyses is not justified (Li et al., 2021). An interesting outcome of our study is that C60 site-heterogeneous models result in more stable phylogenetic relationships than unpartitioned site-homogeneous models. Specifically, we observed that irrespective of the inferred phylogenetic position of Hygrobiidae under SHOMU model, analyses under the SHETU model (and most analyses under the PMSF model) resulted in a clade Hygrobiidae + (Amphizoidae + Aspidytidae). In addition, comparison of the pairwise RF distances of inferred trees among different models suggests that SHETU models result in more stable phylogenetic relationships of Adephaga and are potentially less affected by the trimming or gene selection regimes. Due to computational limitations, we were not able to test this hypothesis for the CAT + GTR + G4 model as not all analyses reached convergence and given that we were not able to perform BSHETU analyses for all datasets. Nevertheless, we suggest that SHETU models may help to reduce incongruence in analyses of different amino-acid supermatrices. Lastly, we corroborate previous claims that site-homogeneous models underestimate substitution saturation (Song et al., 2016; Lozano-Fernandez et al., 2019) for a wide selection of amino-acid datasets and trimming regimes.

GTD analyses and locus-subsampling strategies highlight gene-tree errors in the data GTD analyses on the complete set of loci but also on the selected subsets of loci suggest that our inferred gene trees are characterized by widespread gene-tree errors. The vast majority of gene trees strongly rejected any given well-known clade in Adephaga or in their outgroup but also any alternative phylogenetic hypothesis for the controversial groupings of Adephaga. Further indirect evidence for the extent of gene-tree errors in our dataset is provided by observing the distribution of phylogenetic information among the inferred gene trees. It is frequently assumed that GTDs are mainly due to biological factors such as ILS (Linkem et al., 2016; Cloutier et al., 2019). Despite this, we consider unlikely that ILS has affected all possible deep nodes in the phylogeny of Adephaga and their outgroups and therefore suggest that the observed GTD patterns are very likely due to gene-tree errors. This is more apparent when considering that our GTD analyses mostly show strongly rejected alternative phylogenetic hypotheses in the vast majority of relevant gene trees, rather than strongly supported discordance among alternative phylogenetic hypotheses. Our results confirm the views of other authors who suggest that the biasing effects of biological GTD is possible but might be less important than other biasing factors such as model misspecification and gene-tree errors at deep evolutionary timescales (Gatesy & Springer, 2014; Bryant & Hahn, 2020). Although there is no direct evidence from our analyses that the errors affect specific branches of our inferred species tree, our observations suggest that the results of the different SCAs cannot be trusted with confidence. This is further corroborated from comparing the distances of the selected concatenation-based tree to the trees inferred with SCA using different subsets of genes. These comparisons show that the coalescent method is sensitive to the set of input gene trees. It is, however, encouraging that the SCA recovered many well-established relationships of Adephaga when all genes are sampled, although some of them with low support. It should be noted that the inability of the SCA to infer congruent results with the concatenation-based tree or strongly supported results might also be due to the small number of genes in the selected gene subsets. Specifically, we observed that species trees inferred using the four smallest subsets of genes had the highest topological distance from the concatenation-based tree. In addition, a recent study showed that the ASTRAL method can infer species trees more accurately when thousands of loci are sampled (Tilic et al., 2020). In our study, we investigated whether using genes with higher phylogenetic information reduces potential gene-tree estimation error, yet the potential of increasing the accuracy of SCA by reducing systematic error has to be explored (e.g., by using empirical site-heterogeneous models, such as C10, for inferring individual gene trees Quang et al., 2008). Despite this, our results show that selecting genes based on the likelihood mapping criterion may be a better approach than selecting genes based on number of parsimony-informative sites or the average branch support when aiming at reducing incongruence between SCA and

© 2021 The Authors. Systematic Entomology published by John Wiley & Sons Ltd on behalf of Royal Entomological Society. doi: 10.1111/syen.12508

Phylogenomics of Adephaga 23 concatenation-based analyses. This result is in accordance with previous research that suggests likelihood mapping may be a good a priori estimator of phylogenetic informativeness (Klopfstein et al., 2017).

Conclusions We provide a new set of DNA-hybridization baits that show great promise in recovering protein-coding exons for evolutionary genomic investigations in Adephaga. Using an extensive sampling of species, by combining hybrid-capture sequence data and transcriptomes, we are able to clarify the phylogenetic relationships of the major groups such as the sister group relationship of Gyrinidae to all other families, a clade Haliplidae + Dytiscoidea, and the sister group relationship of Trachypachidae to a clade Carabidae + Cicindelidae. Furthermore, our extensive analyses under different trimming regimes and models shed light on the evolution of the families in Dytiscoidea. We show that moderate supermatrix trimming and a better-fitting site-heterogeneous model place Hygrobiidae as sister to a clade Amphizoidae + monophyletic Aspidytidae. Excessive removal of hypervariable sites using stringent trimming strategies should be avoided as it can lead to potential reduction in phylogenetic signal and reduced resolution of phylogenetic relationships. Site-heterogeneous models fit the data better but most importantly our results show that analyses with C60 site-heterogeneous models result in increased stability of inferred phylogenetic relationships of Dytiscoidea and Adephaga in general. Hence, incongruence between analyses of different subsets of amino-acid supermatrices may be ameliorated by using C60 models. Moreover, our analyses of a carefully curated set of genes suggest that gene-tree errors are prominent in the data and possibly responsible for poorly supported or incongruent species trees in SCA or for incongruent results between concatenation and SCA. Thus, our results show that scientists should take measures to eliminate or minimize gene-tree errors before attributing GTD and phylogenomic incongruence to other factors (e.g., ILS). As we have shown, a promising solution for reducing incongruence between coalescent-based and concatenation-based analyses is to select informative genes based on the likelihood mapping criterion.

Supporting Information

Additional supporting information may be found online in the Supporting Information section at the end of the article. File S1. All supplementary tables (Tables S1–S12) that include: (1) an overview of transcriptomes used in the study and results of orthology assignment for transcriptomes, (2) detailed statistics of the hybrid-enrichment data including orthology assignment results, (3) results of tiling design experiments for bait design, (4) collection and voucher information for the insect samples processed in this

study, (5) Mann-Whitney-Wilcoxon tests for pairwise tests of hybrid-enrichment statistics, (6) summarized results of cross-contamination checks for the hybrid-enrichment data, (7) LB scores of species in supermatrix G, (8) convergence statistics of the Bayesian phylogenetic analyses, (9) model selection results for all amino-acid supermatrices, (10) model comparison statistics for partitioned amino-acid supermatrices based on a fixed neighbor-joining tree, (11) description of analyzed nucleotide supermatrices and their inferred statistics, (12) summarized statistics of the different subsets of genes that were used in SCA. File S2. Supplementary experimental procedures that are not described in detail in the main materials and methods section. File S3. All supplementary figures (Figs S1–S106) that include: (1) the summarized workflow for generating amino-acid supermatrices after the MARE step, (2) quartet-concordance results for the tree resulted from the SHETU-based analysis of supermatrix D, (3) box-plots of number of recovered loci per family for the hybrid-enrichment data, (4) box-plots of Ct/Ca ratio for each family of Adephaga, (5) all inferred phylogenetic trees with branch support values, (6) AliStat, SymTest and ALIGROOVE heatmaps for each amino-acid dataset, (7) all saturation plots inferred under different models, (8) all trees inferred from SCA and (9) GTD analyses for analyzed subsets of genes (LM, PI and SH subsets).

Acknowledgements MB, ON, RGB and BM acknowledge the German Research Foundation (DFG) for funding the project ‘Die Integration von Phylogenomik, Sammlungsbeständen, innovativer Morphologie und umfangreicher paläontologischer Daten - Phylogenie und Evolution der Adephaga (Coleoptera) als Modellfall’ (BA 2152/24-1, NI 1387/7-1, BE 1789/11-1, MI 649/19-1). MB acknowledges support from the SNSB-Innovative scheme. We thank Wendy Moore (University of Arizona, U.S.A.), for granting access to the transcriptome of Metrius contractus before its official release. The authors would like to thank Claudia Etzbauer, Panagiotis Provataris, Jan Philip Oeyen, Malte Petersen and Lars Podsiadlowski for helpful discussions in various steps of the analyses. The authors would also like to thank Dr. Alexandr Prokin for providing valuable background information on fossils of Adephaga. The authors declare that there are no conflicts of interest.

Author contributions MB, ON, RGB and BM conceived the initial project. MB, DRM, ON, RGB and BM contributed to funding acquisition. AV, MB, ON, RGB and BM designed the study. MB and LH collected and provided insect samples. AV and SK performed all

© 2021 The Authors. Systematic Entomology published by John Wiley & Sons Ltd on behalf of Royal Entomological Society. doi: 10.1111/syen.12508

24

A. Vasilikopoulos et al.

molecular laboratory experiments. AV and JMP performed assembly and cross-contamination checks for transcriptomes. SM and JMP performed NCBI sequence submissions. AV performed assembly, contamination checks and further processing of combined hybrid-enrichment data and transcriptomes. AV and KM performed phylogenetic analyses. CM provided bioinformatic methods for outlier detection. AV, MB and RGB drafted the manuscript with AV taking the lead. All authors contributed with comments in later versions of the manuscript.

Data availability statement The datasets supporting the conclusions of this article have been deposited in the figshare digital repository (doi: https://doi.org/ 10.6084/m9.figshare.14838390, ortholog set, bait nucleotide sequences, assemblies of hybrid-enrichment data, filtered multiple sequence alignments, supermatrices, treefiles and custom scripts). New hybrid-enrichment genetic data are deposited in GenBank (Bioproject ID: PRJNA645047, see also File S1: Table S2). Raw reads for the transcriptome of Chlaenius tricolor have been deposited in GenBank (SRA: SRR13633634, see also File S1: Table S1). Open Access funding enabled and organized by Projekt DEAL.

References Akaike, H. (1974) A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19, 716–723. Baca, S.M., Alexander, A., Gustafson, G.T. & Short, A.E.Z. (2017a) Ultraconserved elements show utility in phylogenetic inference of Adephaga (Coleoptera) and suggest paraphyly of ‘Hydradephaga’. Systematic Entomology, 42, 786–795. Baca, S.M., Toussaint, E.F.A., Miller, K.B. & Short, A.E.Z. (2017b) Molecular phylogeny of the aquatic beetle family Noteridae (Coleoptera: Adephaga) with an emphasis on data partitioning strategies. Molecular Phylogenetics and Evolution, 107, 282–292. Balke, M., Ribera, I. & Beutel, R.G. (2003) ASPIDYTIDAE: on the discovery of a new beetle family: detailed morphological analysis, description of a second species, and key to fossil and extant adephagan families (Coleoptera). Water Beetles of China (ed. by M.A. Jäch and L. Ji), pp. 53–66. Wien: Zoologisch-Botanische Gesellschaft & Wiener Coleopterologenverein. Balke, M., Ribera, I., Beutel, R., Viloria, A., Garcia, M. & Vogler, A.P. (2008) Systematic placement of the recently discovered beetle family Meruidae (Coleoptera: Dytiscoidea) based on molecular data. Zoologica Scripta, 37, 647–650. Ballesteros, J.A. & Sharma, P.P. (2019) A critical appraisal of the placement of Xiphosura (Chelicerata) with account of known sources of phylogenetic error. Systematic Biology, 68, 896–917. Bank, S., Sann, M., Mayer, C. et al. (2017) Transcriptome and target DNA enrichment sequence data provide new insights into the phylogeny of vespid wasps (Hymenoptera: Aculeata: Vespidae). Molecular Phylogenetics and Evolution, 116, 213–226. Belkaceme, T. (1991) Skelet und Muskulatur des Kopfes und Thorax von Noterus laevis Sturm. Ein Beitrag zur Morphologie und Phylogenie der Noteridae (Coleoptera: Adephaga). Stuttgarter Beiträge zur Naturkunde, 462, 1–94. Beutel, R.G. (1992a) Larval head structures of Omoglymmius hamatus and their implications for the relationships of Rhysodidae

(Coleoptera: Adephaga). Insect Systematics & Evolution, 23, 169–184. Beutel, R.G. (1992b) Phylogenetic analysis of thoracic structures of Carabidae (Coleoptera: Adephaga). Journal of Zoological Systematics and Evolutionary Research, 30, 53–74. Beutel, R.G. (1994) On the systematic position of Hydrotrupes palpalis Sharp (Coleoptera: Dytiscidae). Aquatic Insects, 16, 157–164. Beutel, R.G. & Roughley, R.E. (1987) On the systematic position of the genus Notomicrus Sharp (Hydradephaga, Coleoptera). Canadian Journal of Zoology, 65, 1898–1905. Beutel, R.G. & Roughley, R.E. (1988) On the systematic position of the family Gyrinidae (Coleoptera: Adephaga). Journal of Zoological Systematics and Evolutionary Research, 26, 380–400. Beutel, R.G. & Roughley, R.E. (1993) Phylogenetic analysis of Gyrinidae based on characters of the larval head (Coleoptera: Adephaga). Insect Systematics & Evolution, 24, 459–468. Beutel, R.G. & Ruhnau, S. (1990) Phylogenetic analysis of the genera of Haliplidae (Coleoptera) based on characters of adults. Aquatic Insects, 12, 1–17. Beutel, R.G., Balke, M. & Steiner, W.E. (2006) The systematic position of Meruidae (Coleoptera, Adephaga) and the phylogeny of the smaller aquatic adephagan beetle families. Cladistics, 22, 102–131. Beutel, R.G., Wang, B., Tan, J.J., Ge, S.Q., Ren, D. & Yang, X.K. (2013) On the phylogeny and evolution of Mesozoic and extant lineages of Adephaga (Coleoptera, Insecta). Cladistics, 29, 147–165. Beutel, R.G., Yan, E., Richter, A., Büsse, S., Miller, K.B., Yavorskaya, M. & Wipfler, B. (2017) The head of Heterogyrus milloti (Coleoptera: Gyrinidae) and its phylogenetic implications. Arthropod Systematics and Phylogeny, 75, 261–280. Beutel, R.G., Pohl, H., Yan, E.V. et al. (2019a) The phylogeny of Coleopterida (Hexapoda) – morphological characters and molecular phylogenies. Systematic Entomology, 44, 75–102. Beutel, R.G., Yan, E., Yavorskaya, M., Büsse, S., Gorb, S.N. & Wipfler, B. (2019b) On the thoracic anatomy of the Madagascan Heterogyrus milloti and the phylogeny of Gyrinidae (Coleoptera). Systematic Entomology, 44, 336–360. Beutel, R.G., Ribera, I., Fikáˇcek, M., Vasilikopoulos, A., Misof, B. & Balke, M. (2020) The morphological evolution of the Adephaga (Coleoptera). Systematic Entomology, 45, 378–395. Bi, K., Vanderpool, D., Singhal, S., Linderoth, T., Moritz, C. & Good, J.M. (2012) Transcriptome-based exon capture enables highly cost-effective comparative genomic data collection at moderate evolutionary scales. BMC Genomics, 13, 403. Bolger, A.M., Lohse, M. & Usadel, B. (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics, 30, 2114–2120. Bowker, A.H. (1948) A test for symmetry in contingency tables. Journal of the American Statistical Association, 43, 572–574. Bradley, R.K., Roberts, A., Smoot, M. et al. (2009) Fast statistical alignment. PLoS Computational Biology, 5, e1000392. Bragg, J.G., Potter, S., Bi, K. & Moritz, C. (2016) Exon capture phylogenomics: efficacy across scales of divergence. Molecular Ecology Resources, 16, 1059–1068. Bryant, D. & Hahn, M.W. (2020) The concatenation question. Phylogenetics in the genomic era (ed. by C. Scornavacca, F. Delsuc and N. Galtier), pp. 3.4:1–3.4:23. https://hal.archives-ouvertes.fr/hal02535651. Burmeister, E.-G. (1976) Der Ovipositor der Hydradephaga (Coleoptera) und seine phylogenetische Bedeutung unter besonderer Berücksichtigung der Dytiscidae. Zoomorphologie, 85, 165–257. Cai, C., Tihelka, E., Pisani, D. & Donoghue, P.C.J. (2020) Data curation and modeling of compositional heterogeneity in insect phylogenomics: a case study of the phylogeny of Dytiscoidea (Coleoptera:

© 2021 The Authors. Systematic Entomology published by John Wiley & Sons Ltd on behalf of Royal Entomological Society. doi: 10.1111/syen.12508

Phylogenomics of Adephaga 25 Adephaga). Molecular Phylogenetics and Evolution, 147, 106782. Chernomor, O., Von Haeseler, A. & Minh, B.Q. (2016) Terrace aware data structure for phylogenomic inference from supermatrices. Systematic Biology, 65, 997–1008. Cloutier, A., Sackton, T.B., Grayson, P., Clamp, M., Baker, A.J. & Edwards, S.V. (2019) Whole-genome analyses resolve the phylogeny of flightless birds (Palaeognathae) in the presence of an empirical anomaly zone. Systematic Biology, 68, 937–955. Criscuolo, A. & Gribaldo, S. (2010) BMGE (Block Mapping and Gathering with Entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments. BMC Evolutionary Biology, 10, 210. Crotty, S.M., Minh, B.Q., Bean, N.G., Holland, B.R., Tuke, J., Jermiin, L.S. & Von Haeseler, A. (2020) GHOST: recovering historical signal from heterotachously evolved sequence alignments. Systematic Biology, 69, 249–264. Crowson, R.A. (1960) The phylogeny of Coleoptera. Annual Review of Entomology, 5, 111–134. Delsuc, F., Brinkmann, H. & Philippe, H. (2005) Phylogenomics and the reconstruction of the tree of life. Nature Reviews Genetics, 6, 361–375. Désamoré, A., Laenen, B., Miller, K.B. & Bergsten, J. (2018) Early burst in body size evolution is uncoupled from species diversification in diving beetles (Dytiscidae). Molecular Ecology, 27, 979–993. Dietz, L., Dömel, J.S., Leese, F., Mahon, A.R. & Mayer, C. (2019) Phylogenomics of the longitarsal Colossendeidae: the evolutionary history of an Antarctic sea spider radiation. Molecular Phylogenetics and Evolution, 136, 206–214. Dressler, C. & Beutel, R.G. (2010) The morphology and evolution of the adult head of Adephaga (Insecta: Coleoptera). Arthropod Systematics and Phylogeny, 68, 239–287. Duran, D.P. & Gough, H.M. (2020) Validation of tiger beetles as distinct family (Coleoptera: Cicindelidae), review and reclassification of tribal relationships. Systematic Entomology, 45, 723–729. Evangelista, D., Simon, S., Wilson, M.M. et al. (2021) Assessing support for Blaberoidea phylogeny suggests optimal locus quality. Systematic Entomology, 46, 157–171. Faircloth, B.C., McCormack, J.E., Crawford, N.G., Harvey, M.G., Brumfield, R.T. & Glenn, T.C. (2012) Ultraconserved elements anchor thousands of genetic markers spanning multiple evolutionary timescales. Systematic Biology, 61, 717–726. Feuda, R., Dohrmann, M., Pett, W. et al. (2017) Improved modeling of compositional heterogeneity supports sponges as sister to all other animals. Current Biology, 27, 3864–3870. Forsyth, D.J. (1970) The structure of the defence glands of the Cicindelidae, Amphizoidae, and Hygrobiidae (Insecta: Coleoptera). Journal of Zoology, 160, 51–69. Foster, P. (2004) Modeling compositional heterogeneity. Systematic Biology, 53, 485–495. Freitas, F.V., Branstetter, M.G., Griswold, T. & Almeida, E.A.B. (2021) Partitioned gene-tree analyses and gene-based topology testing help resolve incongruence in a phylogenomic study of host-specialist bees (Apidae: Eucerinae). Molecular Biology and Evolution, 38, 1090–1100. Gatesy, J. & Springer, M.S. (2014) Phylogenetic analysis at deep timescales: unreliable gene trees, bypassed hidden support, and the coalescence/concatalescence conundrum. Molecular Phylogenetics and Evolution, 80, 231–266. Gontcharov, A.A., Marin, B. & Melkonian, M. (2004) Are combined analyses better than single gene phylogenies? A case study using SSU rDNA and rbcL sequence comparisons in the Zygnematophyceae (Streptophyta). Molecular Biology and Evolution, 21, 612–624.

Gough, H.M., Duran, D.P., Kawahara, A.Y. & Toussaint, E.F.A. (2019) A comprehensive molecular phylogeny of tiger beetles (Coleoptera, Carabidae, Cicindelinae). Systematic Entomology, 44, 305–321. Gough, H.M., Allen, J.M., Toussaint, E.F.A., Storer, C.G. & Kawahara, A.Y. (2020) Transcriptomics illuminate the phylogenetic backbone of tiger beetles. Biological Journal of the Linnean Society, 129, 740–751. Guindon, S., Dufayard, J.F., Lefort, V., Anisimova, M., Hordijk, W. & Gascuel, O. (2010) New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Systematic Biology, 59, 307–321. Gustafson, G.T., Bergsten, J., Ranarilalatiana, T., Randriamihaja, J.H. & Miller, K.B. (2017a) The morphology and behavior of the endemic Malagasy whirligig beetle Heterogyrus milloti Legros, 1953 (Coleoptera: Gyrinidae: Heterogyrinae). The Coleopterists Bulletin, 71, 315–328. Gustafson, G.T., Prokin, A.A., Bukontaite, R., Bergsten, J. & Miller, K.B. (2017b) Tip-dated phylogeny of whirligig beetles reveals ancient lineage surviving on Madagascar. Scientific Reports, 7, 8619. Gustafson, G.T., Alexander, A., Sproul, J.S., Pflug, J.M., Maddison, D.R. & Short, A.E.Z. (2019) Ultraconserved element (UCE) probe set design: base genome and initial design parameters critical for optimization. Ecology and Evolution, 9, 6933–6948. Gustafson, G.T., Baca, S.M., Alexander, A.M. & Short, A.E.Z. (2020) Phylogenomic analysis of the beetle suborder Adephaga with comparison of tailored and generalized ultraconserved element probe performance. Systematic Entomology, 45, 552–570. Gustafson, G.T., Miller, K.B., Michat, M.C., Alarie, Y., Baca, S.M., Balke, M. & Short, A.E.Z. (2021) The enduring value of reciprocal illumination in the era of insect phylogenomics: a response to Cai et al. (2020). Systematic Entomology, 46, 473–486. Hawlitschek, O., Hendrich, L. & Balke, M. (2012) Molecular phylogeny of the squeak beetles, a family with disjunct Palearctic-Australian range. Molecular Phylogenetics and Evolution, 62, 550–554. Hoang, D.T., Chernomor, O., von Haeseler, A., Minh, B.Q. & Le, S.V. (2018) UFBoot2: improving the ultrafast bootstrap approximation. Molecular Biology and Evolution, 35, 518–522. Hosner, P.A., Faircloth, B.C., Glenn, T.C., Braun, E.L. & Kimball, R.T. (2016) Avoiding missing data biases in phylogenomic inference: an empirical study in the landfowl (Aves: Galliformes). Molecular Biology and Evolution, 33, 1110–1125. Huerta-Cepas, J., Serra, F. & Bork, P. (2016) ETE 3: reconstruction, analysis, and visualization of phylogenomic data. Molecular Biology and Evolution, 33, 1635–1638. Hunt, T., Bergsten, J., Levkanicova, Z. et al. (2007) A comprehensive phylogeny of beetles reveals the evolutionary origins of a superradiation. Science, 318, 1913–1916. Irisarri, I. & Meyer, A. (2016) The identification of the closest living relative(s) of tetrapods: phylogenomic lessons for resolving short ancient internodes. Systematic Biology, 65, 1057–1075. Jäch, M.A. & Balke, M. (2008) Global diversity of water beetles (Coleoptera) in freshwater. Hydrobiologia, 595, 419–442. Jermiin, L.S. & Misof, B. (2020) Measuring historical and compositional signals in phylogenetic data. bioRxiv. https://doi.org/10.1101/ 2020.01.03.894097. Kalyaanamoorthy, S., Minh, B.Q., Wong, T.K.F., von Haeseler, A. & Jermiin, L.S. (2017) ModelFinder: fast model selection for accurate phylogenetic estimates. Nature Methods, 14, 587–589. Kapli, P. & Telford, M.J. (2020) Topology dependent asymmetry in systematic errors affects phylogenetic placement of Ctenophora and Xenacoelomorpha. Science Advances, 6, eabc5162.

© 2021 The Authors. Systematic Entomology published by John Wiley & Sons Ltd on behalf of Royal Entomological Society. doi: 10.1111/syen.12508

26

A. Vasilikopoulos et al.

Kapli, P., Yang, Z. & Telford, M.J. (2020) Phylogenetic tree building in the genomic age. Nature Reviews Genetics, 21, 428–444. Karin, B.R., Gamble, T., Jackman, T.R. & Vidal, N. (2020) Optimizing phylogenomics with rapidly evolving long exons: comparison with anchored hybrid enrichment and ultraconserved elements. Molecular Biology and Evolution, 37, 904–922. Kavanaugh, D.H. (1986) A systematic review of amphizoid beetles (Amphizoidae: Coleoptera) and their phylogenetic relationships to other Adephaga. Proceedings of the California Academy of Sciences, 44, 67–109. Klopfstein, S., Massingham, T. & Goldman, N. (2017) More on the best evolutionary rate for phylogenetic analysis. Systematic Biology, 66, 769–785. Kocot, K.M., Struck, T.H., Merkel, J. et al. (2017) Phylogenomics of Lophotrochozoa with consideration of systematic error. Systematic Biology, 66, 256–282. Kück, P. & Longo, G.C. (2014) FASconCAT-G: extensive functions for multiple sequence alignment preparations concerning phylogenetic studies. Frontiers in Zoology, 11, 81. Kück, P. & Struck, T.H. (2014) BaCoCa - a heuristic software tool for the parallel assessment of sequence biases in hundreds of gene and taxon partitions. Molecular Phylogenetics and Evolution, 70, 94–98. Kück, P., Meusemann, K., Dambach, J., Thormann, B., von Reumont, B.M., Wägele, J.W. & Misof, B. (2010) Parametric and non-parametric masking of randomness in sequence alignments can be improved and leads to better resolved trees. Frontiers in Zoology, 7, 10. Kück, P., Meid, S.A., Groß, C., Wägele, J.W. & Misof, B. (2014) AliGROOVE - visualization of heterogeneous sequence divergence within multiple sequence alignments and detection of inflated branch support. BMC Bioinformatics, 15, 294. Lanfear, R., Calcott, B., Kainer, D., Mayer, C. & Stamatakis, A. (2014) Selecting optimal partitioning schemes for phylogenomic datasets. BMC Evolutionary Biology, 14, 82. Lartillot, N. & Philippe, H. (2004) A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Molecular Biology and Evolution, 21, 1095–1109. Lartillot, N., Brinkmann, H. & Philippe, H. (2007) Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model. BMC Evolutionary Biology, 7, S4. Lartillot, N., Rodrigue, N., Stubbs, D. & Richer, J. (2013) PhyloBayes MPI: phylogenetic reconstruction with infinite mixtures of profiles in a parallel environment. Systematic Biology, 62, 611–615. Laumer, C.E., Fernández, R., Lemer, S. et al. (2019) Revisiting metazoan phylogeny with genomic sampling of all phyla. Proceedings of the Royal Society B: Biological Sciences, 286, 20190831. Lawrence, J.F. & Newton, A.F., JR (1982) Evolution and classification of beetles. Annual Review of Ecology and Systematics, 13, 261–290. ´ nski, A., Seago, A.E., Thayer, M.K., Newton, A.F. Lawrence, J.F., Slipi´ & Marvaldi, A.E. (2011) Phylogeny of the Coleoptera based on morphological characters of adults and larvae. Annales Zoologici, 61, 1–217. Lemmon, E.M. & Lemmon, A.R. (2013) High-throughput genomic data in systematics and phylogenetics. Annual Review of Ecology, Evolution, and Systematics, 44, 99–121. Lemmon, A.R., Brown, J.M., Stanger-Hall, K. & Lemmon, E.M. (2009) The effect of ambiguous data on phylogenetic estimates obtained by maximum likelihood and Bayesian inference. Systematic Biology, 58, 130–145. Lemmon, A.R., Emme, S.A. & Lemmon, E.M. (2012) Anchored hybrid enrichment for massively high-throughput phylogenomics. Systematic Biology, 61, 727–744. Li, H. & Durbin, R. (2009) Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics, 25, 1754–1760.

Li, H., Handsaker, B., Wysoker, A. et al. (2009) The sequence alignment/map format and SAMtools. Bioinformatics, 25, 2078–2079. Li, Y., Shen, X.-X., Evans, B., Dunn, C.W. & Rokas, A. (2021) Rooting the animal tree of life. Molecular Biology and Evolution. https://doi .org/10.1093/molbev/msab170. Linkem, C.W., Minin, V.N. & Leaché, A.D. (2016) Detecting the anomaly zone in species trees and evidence for a misleading signal in higher-level skink phylogeny (Squamata: Scincidae). Systematic Biology, 65, 465–477. López-López, A. & Vogler, A.P. (2017) The mitogenome phylogeny of Adephaga (Coleoptera). Molecular Phylogenetics and Evolution, 114, 166–174. Lorenz, W. (2020) CarabCat: global database of ground beetles (version Oct 2017) [WWW document]. Species 2000 ITIS Cat. Life, 2020-12-01. URL https://www.catalogueoflife.org/ [accessed on 12 November 2020]. Lozano-Fernandez, J., Tanner, A.R., Giacomelli, M., Carton, R., Vinther, J., Edgecombe, G.D. & Pisani, D. (2019) Increasing species sampling in chelicerate genomic-scale datasets provides support for monophyly of Acari and Arachnida. Nature Communications, 10, 2295. Maddison, D.R., Moore, W., Baker, M.D., Ellis, T.M., Ober, K.A., Cannone, J.J. & Gutell, R.R. (2009) Monophyly of terrestrial adephagan beetles as indicated by three nuclear genes (Coleoptera: Carabidae and Trachypachidae). Zoologica Scripta, 38, 43–62. Mayer, C., Sann, M., Donath, A. et al. (2016) BaitFisher: a software package for multispecies target DNA enrichment probe design. Molecular Biology and Evolution, 33, 1875–1886. Mayer, C., Dietz, L., Call, E., Kukowka, S., Martin, S. & Espeland, M. (2021) Adding leaves to the Lepidoptera tree: capturing hundreds of nuclear genes from old museum specimens. Systematic Entomology, 46, 649–671. McCormack, J.E., Hird, S.M., Zellmer, A.J., Carstens, B.C. & Brumfield, R.T. (2013) Applications of next-generation sequencing to phylogeography and phylogenetics. Molecular Phylogenetics and Evolution, 66, 526–538. McKenna, D.D., Wild, A.L., Kanda, K. et al. (2015) The beetle tree of life reveals that Coleoptera survived end-Permian mass extinction to diversify during the Cretaceous terrestrial revolution. Systematic Entomology, 40, 835–880. McKenna, D.D., Shin, S., Ahrens, D. et al. (2019) The evolution and genomic basis of beetle diversity. Proceedings of the National Academy of Sciences, 116, 24729–24737. Michat, M.C., Alarie, Y. & Miller, K.B. (2017) Higher-level phylogeny of diving beetles (Coleoptera: Dytiscidae) based on larval characters. Systematic Entomology, 42, 734–767. Miller, K.B. (2001) On the phylogeny of the Dytiscidae (Insecta: Coleoptera) with emphasis on the morphology of the female reproductive system. Insect Systematics & Evolution, 32, 45–92. Miller, K.B. (2009) On the systematics of Noteridae (Coleoptera: Adephaga: Hydradephaga): phylogeny, description of a new tribe, genus and species, and survey of female genital morphology. Systematics and Biodiversity, 7, 191–214. Miller, K.B. & Bergsten, J. (2012) Phylogeny and classification of whirligig beetles (Coleoptera: Gyrinidae): relaxed-clock model outperforms parsimony and time-free Bayesian analyses. Systematic Entomology, 37, 706–746. Miller, K.B. & Bergsten, J. (2014) The phylogeny and classification of predaceous diving beetles. Ecology, systematics, and the natural history of predaceous diving beetles (Coleoptera: Dytiscidae) (ed. by D. Yee), pp. 49–172. Springer, Dordrecht. Minh, B.Q., Schmidt, H.A., Chernomor, O., Schrempf, D., Woodhams, M.D., von Haeseler, A. & Lanfear, R. (2020) IQ-TREE 2: new models

© 2021 The Authors. Systematic Entomology published by John Wiley & Sons Ltd on behalf of Royal Entomological Society. doi: 10.1111/syen.12508

Phylogenomics of Adephaga 27 and efficient methods for phylogenetic inference in the genomic era. Molecular Biology and Evolution, 37, 1530–1534. Mirarab, S., Bayzid, M.S. & Warnow, T. (2016) Evaluating summary methods for multilocus species tree estimation in the presence of incomplete lineage sorting. Systematic Biology, 65, 366–380. Misof, B. & Misof, K. (2009) A Monte Carlo approach successfully identifies randomness in multiple sequence alignments: a more objective means of data exclusion. Systematic Biology, 58, 21–34. Misof, B., Rickert, A.M., Buckley, T.R., Fleck, G. & Sauer, K.P. (2001) Phylogenetic signal and its decay in mitochondrial SSU and LSU rRNA gene fragments of Anisoptera. Molecular Biology and Evolution, 18, 27–37. Misof, B., Meyer, B., von Reumont, B.M., Kück, P., Misof, K. & Meusemann, K. (2013) Selecting informative subsets of sparse supermatrices increases the chance to find correct trees. BMC Bioinformatics, 14, 348. Misof, B., Liu, S., Meusemann, K. et al. (2014a) Phylogenomics resolves the timing and pattern of insect evolution. Science, 346, 763–767. Misof, B., Meusemann, K., von Reumont, B.M., Kück, P., Prohaska, S.J. & Stadler, P.F. (2014b) A priori assessment of data quality in molecular phylogenetics. Algorithms for Molecular Biology, 9, 22. Mongiardino Koch, N. & Thompson, J.R. (2021) A total-evidence dated phylogeny of Echinoidea combining phylogenomic and paleontological data. Systematic Biology, 70, 421–439. Naser-Khdour, S., Minh, B.Q., Zhang, W., Stone, E.A., Lanfear, R. & Bryant, D. (2019) The prevalence and impact of model violations in phylogenetic analysis. Genome Biology and Evolution, 11, 3341–3352. Nguyen, L.-T., Schmidt, H.A., Von Haeseler, A. & Minh, B.Q. (2015) IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Molecular Biology and Evolution, 32, 268–274. Nilsson, A.N. (1989) On the genus Agabetes Crotch (Coleoptera, Dytiscidae), with a new species from Iran. Annales Entomologici Fennici, 55, 35–40. Nosenko, T., Schreiber, F., Adamska, M. et al. (2013) Deep metazoan phylogeny: when different genes tell different stories. Molecular Phylogenetics and Evolution, 67, 223–233. Pease, J.B., Brown, J.W., Walker, J.F., Hinchliff, C.E. & Smith, S.A. (2018) Quartet sampling distinguishes lack of support from conflicting support in the green plant tree of life. American Journal of Botany, 105, 385–403. Peng, Y., Leung, H.C.M., Yiu, S.M. & Chin, F.Y.L. (2012) IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics, 28, 1420–1428. Petersen, M., Meusemann, K., Donath, A. et al. (2017) Orthograph: a versatile tool for mapping coding nucleotide sequences to clusters of orthologous genes. BMC Bioinformatics, 18, 111. Philippe, H., de Vienne, D.M., Ranwez, V., Roure, B., Baurain, D. & Delsuc, F. (2017) Pitfalls in supermatrix phylogenomics. European Journal of Taxonomy, 283, 1–25. Phillips, M.J., Delsuc, F. & Penny, D. (2004) Genome-scale phylogeny and the detection of systematic biases. Molecular Biology and Evolution, 21, 1455–1458. Portik, D.M. & Wiens, J.J. (2021) Do alignment and trimming methods matter for phylogenomic (UCE) analyses? Systematic Biology, 70, 440–462. Quang, L.S., Gascuel, O. & Lartillot, N. (2008) Empirical profile mixture models for phylogenetic reconstruction. Bioinformatics, 24, 2317–2323. R Core Team (2020) R: A Language and Environment for Statistical Computing.

Ranwez, V., Douzery, E.J.P., Cambon, C., Chantret, N. & Delsuc, F. (2018) MACSE v2: toolkit for the alignment of coding sequences accounting for frameshifts and stop codons. Molecular Biology and Evolution, 35, 2582–2584. Ribera, I., Hogan, J.E. & Vogler, A.P. (2002) Phylogeny of hydradephagan water beetles inferred from 18S rRNA sequences. Molecular Phylogenetics and Evolution, 23, 43–62. Ribera, I., Vogler, A.P. & Balke, M. (2008) Phylogeny and diversification of diving beetles (Coleoptera: Dytiscidae). Cladistics, 24, 563–590. Robinson, D.F. & Foulds, L.R. (1981) Comparison of phylogenetic trees. Mathematical Biosciences, 53, 131–147. Rodríguez-Ezpeleta, N., Brinkmann, H., Roure, B., Lartillot, N., Lang, B.F. & Philippe, H. (2007) Detecting and overcoming systematic errors in genome-scale phylogenies. Systematic Biology, 56, 389–399. Sann, M., Niehuis, O., Peters, R.S. et al. (2018) Phylogenomic analysis of Apoidea sheds new light on the sister group of bees. BMC Evolutionary Biology, 18, 71. Sayyari, E., Whitfield, J.B. & Mirarab, S. (2017) Fragmentary gene sequences negatively impact gene tree and species tree reconstruction. Molecular Biology and Evolution, 34, 3279–3291. Sayyari, E., Whitfield, J.B. & Mirarab, S. (2018) DiscoVista: interpretable visualizations of gene tree discordance. Molecular Phylogenetics and Evolution, 122, 110–115. Sharma, P.P., Kaluziak, S.T., Pérez-Porro, A.R., González, V.L., Hormiga, G., Wheeler, W.C. & Giribet, G. (2014) Phylogenomic interrogation of Arachnida reveals systemic conflicts in phylogenetic signal. Molecular Biology and Evolution, 31, 2963–2984. Short, A.E.Z. (2018) Systematics of aquatic beetles (Coleoptera): current state and future directions. Systematic Entomology, 43, 1–18. Shull, V.L., Vogler, A.P., Baker, M.D., Maddison, D.R. & Hammond, P.M. (2001) Sequence alignment of 18S ribosomal RNA and the basal relationships of adephagan beetles: evidence for monophyly of aquatic families and the placement of Trachypachidae. Systematic Biology, 50, 945–969. Simion, P., Belkhir, K., François, C. et al. (2018) A software tool ‘CroCo’ detects pervasive cross-species contamination in next generation sequencing data. BMC Biology, 16, 28. Simion, P., Delsuc, F. & Philippe, H. (2020) To what extent current limits of phylogenomics can be overcome? Phylogenetics in the genomic era (ed. by C. Scornavacca, F. Delsuc and N. Galtier), pp. 2.1:1–2.1:34. https://hal.archives-ouvertes.fr/hal-02535366. Simmons, M.P. & Kessenich, J. (2020) Divergence and support among slightly suboptimal likelihood gene trees. Cladistics, 36, 322–340. Song, H., Sheffield, N.C., Cameron, S.L., Miller, K.B. & Whiting, M.F. (2010) When phylogenetic assumptions are violated: base compositional heterogeneity and among-site rate variation in beetle mitochondrial phylogenomics. Systematic Entomology, 35, 429–448. Song, F., Li, H., Jiang, P. et al. (2016) Capturing the phylogeny of Holometabola with mitochondrial genome data and Bayesian site-heterogeneous mixture models. Genome Biology and Evolution, 8, 1411–1426. Spangler, P.J. & Steiner, W.E. Jr (2005) A new aquatic beetle family, Meruidae, from Venezuela (Coleoptera: Adephaga). Systematic Entomology, 30, 339–357. Steenwyk, J.L., Buida, T.J. III, Li, Y., Shen, X.-X. & Rokas, A. (2020) ClipKIT: a multiple sequence alignment trimming software for accurate phylogenomic inference. PLoS Biology, 18, e3001007. Strimmer, K. & von Haeseler, A. (1997) Likelihood-mapping: a simple method to visualize phylogenetic content of a sequence alignment. Proceedings of the National Academy of Sciences, 94, 6815–6819. Stuart, A. (1955) A test for homogeneity of the marginal distributions in a two-way classification. Biometrika, 42, 412–416.

© 2021 The Authors. Systematic Entomology published by John Wiley & Sons Ltd on behalf of Royal Entomological Society. doi: 10.1111/syen.12508

28

A. Vasilikopoulos et al.

Suyama, M., Torrents, D. & Bork, P. (2006) PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Research, 34, 609–612. Talavera, G. & Castresana, J. (2007) Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Systematic Biology, 56, 564–577. Tan, G., Muffato, M., Ledergerber, C., Herrero, J., Goldman, N., Gil, M. & Dessimoz, C. (2015) Current methods for automated filtering of multiple sequence alignments frequently worsen single-gene phylogenetic inference. Systematic Biology, 64, 778–791. Tilic, E., Sayyari, E., Stiller, J., Mirarab, S. & Rouse, G.W. (2020) More is needed – thousands of loci are required to elucidate the relationships of the ‘flowers of the sea’ (Sabellida, Annelida). Molecular Phylogenetics and Evolution, 151, 106892. Toussaint, E.F.A., Beutel, R.G., Morinière, J. et al. (2016) Molecular phylogeny of the highly disjunct cliff water beetles from South Africa and China (Coleoptera: Aspidytidae). Zoological Journal of the Linnean Society, 176, 537–546. Toussaint, E.F.A., Seidel, M., Arriaga-Varela, E. et al. (2017) The peril of dating beetles. Systematic Entomology, 42, 1–10. Vasilikopoulos, A., Balke, M., Beutel, R.G. et al. (2019) Phylogenomics of the superfamily Dytiscoidea (Coleoptera: Adephaga) with an evaluation of phylogenetic conflict and systematic error. Molecular Phylogenetics and Evolution, 135, 270–285. Vasilikopoulos, A., Gustafson, G.T., Balke, M., Niehuis, O., Beutel, R.G. & Misof, B. (2021) Resolving the phylogenetic position of Hygrobiidae (Coleoptera: Adephaga) requires objective statistical tests and exhaustive phylogenetic methodology: a response to Cai et al. (2020). Molecular Phylogenetics and Evolution, 162, 106923.

van Vondel, B.J. (2019) Features of the metacoxal air-storage space as additional characters for reconstructing the phylogeny of Haliplidae (Coleoptera). Tijdschrift voor Entomologie, 162, 13–32. Wang, H.-C., Minh, B.Q., Susko, E. & Roger, A.J. (2018) Modeling site heterogeneity with posterior mean site frequency profiles accelerates accurate phylogenomic estimation. Systematic Biology, 67, 216–235. Wang, H.-C., Susko, E. & Roger, A.J. (2019) The relative importance of modeling site pattern heterogeneity versus partition-wise heterotachy in phylogenomic inference. Systematic Biology, 68, 1003–1019. Whelan, S., Irisarri, I. & Burki, F. (2018) PREQUAL: detecting non-homologous characters in sets of unaligned homologous sequences. Bioinformatics, 34, 3929–3930. Wong, T.K.F., Kalyaanamoorthy, S., Meusemann, K., Yeates, D.K., Misof, B. & Jermiin, L.S. (2020) A minimum reporting standard for multiple sequence alignments. NAR Genomics and Bioinformatics, 2, lqaa024. https://doi.org/10.1093/nargab/lqaa024. Young, A.D. & Gillung, J.P. (2020) Phylogenomics – principles, opportunities and pitfalls of big-data phylogenetics. Systematic Entomology, 45, 225–247. Zhang, C., Rabiee, M., Sayyari, E. & Mirarab, S. (2018a) ASTRAL-III: polynomial time species tree reconstruction from partially resolved gene trees. BMC Bioinformatics, 19, 153. ´ nski, A. & Zhang, S.-Q., Che, L.-H., Li, Y., Liang, D., Pang, H., Slipi´ Zhang, P. (2018b) Evolutionary history of Coleoptera revealed by extensive sampling of genes and species. Nature Communications, 9, 205. Accepted 1 July 2021

© 2021 The Authors. Systematic Entomology published by John Wiley & Sons Ltd on behalf of Royal Entomological Society. doi: 10.1111/syen.12508

Species

Agrilus planipennis

Aleochara curtula

Amphizoa insolens

Amphizoa lecontei

Anomala sp.

Aspidytes niobe

Batrachomatus nannup

Bembidion corgenoma

Calosoma frigidum

Carabus granulatus

Chlaenius tricolor

Cicindela hybrida

Clinidium baldufi

Cybister lateralimarginalis

Dineutus sp.

Elaphrus aureus

Fibla maclachlani

Gyrinus marinus

Haliplus (Haliplus) fluviatilis

Hydroscapha redfordi

Hygrobia hermanni

Hygrobia nigra

Lepicerus sp.

Liopterus haemorrhoidalis

Melanotus villosus

Micromalthus debilis

Noterus clavicornis

Oxoplatypus quadridentatus

Pogonus chalceus

Priacma serrata

Protohermes xanthodes

Pseudimares aphrodite

Puncha ratzeburgi

Sinaspidytes wrasei

Stylops melittae

Thermonectus intermedius

Trachypachus gibbsii

Xenos vesparum

No.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

Strepsiptera

Coleoptera

Coleoptera

Strepsiptera

Coleoptera

Raphidioptera

Neuroptera

Megaloptera

Coleoptera

Coleoptera

Coleoptera

Coleoptera

Coleoptera

Coleoptera

Coleoptera

Coleoptera

Coleoptera

Coleoptera

Coleoptera

Coleoptera

Coleoptera

Raphidioptera

Coleoptera

Coleoptera

Coleoptera

Coleoptera

Coleoptera

Coleoptera

Coleoptera

Coleoptera

Coleoptera

Coleoptera

Coleoptera

Coleoptera

Coleoptera

Coleoptera

Coleoptera

Coleoptera

Order

Elateridae

Dytiscidae

Lepiceridae

Hygrobiidae

Hygrobiidae

Hydroscaphidae

Haliplidae

Gyrinidae

Raphidiidae

Carabidae

Gyrinidae

Dytiscidae

Carabidae

Carabidae

Carabidae

Carabidae

Carabidae

Carabidae

Dytiscidae

Amphizoidae

Scarabaeidae

Amphizoidae

Amphizoidae

Staphylinidae

Buprestidae

Family

Carabidae

Curculionidae

Noteridae

Adephaga

Adephaga

Adephaga

Xenidae

Trachypachidae

Dytiscidae

Stylopidae

Aspidytidae

Raphidiidae

Myrmeleontidae

Corydalidae

Archostemata Cupedidae

Adephaga

Adephaga

Adephaga

Archostemata Micromalthidae

Polyphaga

Adephaga

Myxophaga

Adephaga

Adephaga

Adephaga

Adephaga

Adephaga

Adephaga

Adephaga

Adephaga

Adephaga

Adephaga

Adephaga

Adephaga

Adephaga

Adephaga

Adephaga

Adephaga

Polyphaga

Adephaga

Adephaga

Polyphaga

Polyphaga

Suborder McKenna et al. (2019)

Source of data, see File S2 for references

Corydalinae

Trechinae

Copelatinae

Gyrininae

Elaphrinae

Gyrininae

Cybistrinae

Rhysodinae

Cicindelinae

Harpalinae

Carabinae

Carabinae

Trechinae

Matinae

Rutelinae

McKenna et al. (2019)

McKenna et al. (2019)

Bousseau et al. (2014)

Misof et al. (2014)

Vasilikopoulos et al. (2019)

Vasilikopoulos et al. (2020)

Vasilikopoulos et al. (2020)

Vasilikopoulos et al. (2020)

Peters et al. (2014)

Van Belleghem et al. (2012)

McKenna et al. (2019)

Vasilikopoulos et al. (2019)

McKenna et al. (2019)

McKenna et al. (2019)

Vasilikopoulos et al. (2019)

Misof et al. (2014)

Vasilikopoulos et al. (2019)

Vasilikopoulos et al. (2019)

McKenna et al. (2019)

Vasilikopoulos et al. (2019)

Misof et al. (2014)

Vasilikopoulos et al. (2020)

McKenna et al. (2019)

Vasilikopoulos et al. (2019)

Vasilikopoulos et al. (2019)

McKenna et al. (2019)

McKenna et al. (2019)

this study

Peters et al. (2014)

Seppey et al. (2019), McKenna et al. (2019)

Pflug et al. (2020)

Vasilikopoulos et al. (2019)

Vasilikopoulos et al. (2019)

McKenna et al. (2019)

Vasilikopoulos et al. (2019)

Vasilikopoulos et al. (2019)

Aleocharinae Misof et al. (2014), Pauli et al. (2016)

Agrilinae

Subfamily

SRS1130134

SRS4551443

SRS462933 not submitted, available from figshare (link in Bousseau et al., 2014)

SRS976415

SRS851413

SRS851414

SRS851415

SRS369694

SRS295765, SRS295764, SRS295760

SRS976364

SRS976372

SRS976335

SRS976331

SRS2403778

SRS462869

SRS2401245

SRS2401244

SRS976387

SRS976318

SRS462864

SRS851455

SRS976395

SRS976399

SRS976403

SRS976313

SRS976316

SRS8198743

SRS369485

SRS976409

SRS4551447

SRS2403779

SRS2401200

SRS976424

SRS2403774

SRS2428194

SRS462784

SRS976425

yes

Not performed cross-contamination check and vector cont. check performed in this study

yes

yes

yes

yes

yes

Not performed

Not performed

yes

yes

yes

yes

yes

yes

yes

yes

yes

yes

yes

yes

yes

yes

yes

yes

yes

Not performed cross-contamination check and vector cont. check performed in this study

yes

yes cross-contamination check and vector cont. check performed in this study

yes

yes

yes

yes cross-contamination check and vector cont. check performed in this study

yes

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

581

651

651

587

651

647

645

647

533

651

639

651

646

651

651

646

650

649

650

651

651

632

651

651

651

651

651

651

593

650

650

651

651

649

650

650

645

635

Cross-contaminations and vector Used for bait No. of NCBI-SRS number contaminations previously performed design orthologous hits

186,208

239,692

232,631

186,413

237,248

234,676

230,237

235,991

84,831

234,803

232,576

237,432

227,128

246,187

217,978

222,557

233,754

240,424

232,794

239,552

227,318

209,371

241,439

217,974

239,346

209,510

239,549

241,593

135,447

224,402

233,116

230,926

239,432

239,180

230,613

237,776

233,241

217,257

4

0

0

6

20

4

4

1

0

0

3

3

9

23

0

7

5

7

2

27

6

21

21

26

6

9

7

0

0

1

0

0

6

5

0

0

1

1

Total no. of amino-acids No. of Xs

4

2

1

4

2

5

5

4

13

9

7

6

6

3

10

10

8

1

3

6

3

5

7

2

5

6

5

3

4

6

8

3

4

2

5

4

3

5

No. of stop signs

381

431

422

372

429

419

416

428

179

420

430

433

418

445

388

407

432

438

426

438

407

389

438

395

435

372

436

438

269

406

420

412

438

439

431

436

423

399

N50 of lengths

320

368

357

317

364

362

356

364

159

360

363

364

351

378

334

344

359

370

358

367

349

331

370

334

367

321

367

371

228

345

358

354

367

368

354

365

361

342

Average length

273.0

323.0

316.0

261.0

317.0

315.0

307.0

312.0

155.0

314.0

313.0

314.0

310.0

328.0

297.0

305.0

317.5

322.0

315.0

324.0

315.0

290.0

322.0

308.0

320.0

288.0

322.0

322.0

208.0

304.0

304.5

317.0

320.0

318.0

309.5

318.5

318.0

306.0

Median length

1,849

2,366

2,365

2,362

2,366

2,372

2,052

2,370

553

2,069

2,370

2,366

1,061

2,132

1,092

1,222

1,448

2,366

1,419

2,366

1,223

1,220

2,366

1,387

2,366

1,094

2,272

2,365

790

1,267

2,365

1,291

1,678

2,154

1,153

1,915

1,217

1,222

Max. length

Table S1: Overview of the transcriptomes that were used for bait design and for downstream phylogenetic reconstructions. Statistics of the orthology assignment with Orthograph based on the 651 genes for which baits had been originally designed are also provided (amino-acid output of Orthograph). Note that for initial bait design we also included the transcriptome of Metrius contractus (kindly provided by Wendy Moore that was not included for downstream orthology assignment and phylogenetic analyses).

21

77

52

26

61

62

50

59

28

73

16

66

55

57

58

65

74

76

75

76

85

63

67

66

71

44

65

77

33

74

73

49

73

69

47

60

77

61

Min. length

Noteridae

Dytiscidae

Dytiscidae

Carabidae

Dytiscidae

Dytiscidae

Carabidae

Dytiscidae

Dytiscidae

Dytiscidae

Dytiscidae

Carabidae

Carabidae

Carabidae

Dytiscidae

Haliplidae

Haliplidae

Haliplidae

Dytiscidae

Dytiscidae

Noteridae

Dytiscidae

Dytiscidae

Dytiscidae

Dytiscidae

Dytiscidae

Dytiscidae

Dytiscidae

Dytiscidae

Dytiscidae

Dytiscidae

Dytiscidae

Carabidae

Dytiscidae

Carabidae

Gyrinidae

Cicindelidae

Dytiscidae

Dytiscidae

14 Canthydrus sp.

15 Caperhantus cicurius

16 Celina imitatrix

17 Clivina sp.

18 Copelatus caelatipennis

19 Coptotomus sp.

20 Cychrus sp.

21 Derovatellus peruanus

22 Dytiscus marginalis

23 Eretes griseus

24 Exocelina sp.

25 Galerita sp.

26 Glyptolenus sp.

27 Goniotropis sp.

28 Graptodytes pictus

29 Haliplus (Haliplidius) confinis

30 Haliplus (Liaphlus) laminatus

31 Haliplus (Neohaliplus) lineatocollis

32 Hydaticus pacificus

33 Hyderodes shuckardi

34 Hydrocanthus oblongus

35 Hydrodytes opalinus

36 Hydroglyphus geminus

37 Hydroporus erythrocephalus

38 Hydrotrupes palpalis

39 Hydrovatus fasciatus

40 Hygrotus (Leptolambus) impressopunctatus

41 Hyphydrus ovatus

42 Ilybius fenestratus

43 Laccodytes sp.

44 Laccophilus poecilus

45 Laccornis oblongus

46 Lachnophorini sp.

47 Lancetes sp.

48 Loricera pilicornis

49 Macrogyrus sp.

50 Manticora latipennis

51 Matus sp.

52 Megadytes sp.

Carabidae

9 Broscus cephalotes

Carabidae

Dytiscidae

8 Bidessus unistriatus

13 Calophaena bicincta

Gyrinidae

7 Andogyrus sp.

Carabidae

Cicindelidae

6 Amblycheila cylindriformis

12 Calathus sp.

Haliplidae

5 Algophilus lathridioides

Dytiscidae

Dytiscidae

4 Agabus undulatus

11 Bunites distigma

Dytiscidae

3 Agabetes acuductus

Haliplidae

Carabidae

2 Adelotopus paroensis

10 Brychius elevatus

Dytiscidae

Family

1 Acilius canaliculatus

Species_name

Cybistrinae

Matinae

Gyrininae

Loricerinae

Lancetinae

Harpalinae

Hydroporinae

Laccophillinae

Laccophillinae

Agabinae

Hydroporinae

Hydroporinae

Hydroporinae

Agabinae

Hydroporinae

Hydroporinae

Hydroporinae

Noterinae

Dytiscinae

Dytiscinae

Haliplinae

Haliplinae

Haliplinae

Hydroporinae

Paussinae

Harpalinae

Harpalinae

Copelatinae

Dytiscinae

Dytiscinae

Hydroporinae

Carabinae

Coptotominae

Copelatinae

Scaritinae

Celinae

Colymbetinae

Noterinae

Harpalinae

Harpalinae

Colymbetinae

Haliplinae

Broscinae

Hydroporinae

Gyrininae

Haliplinae

Agabinae

Laccophillinae

Harpalinae

Dytiscinae

Subfamily

Cybistrini

Matini

Manticorini

Dineutini

Loricerini

Lancetini

Lebiitae, Lachnophorini

Laccornini

Laccophilini

Laccophilini

Agabini

Hyphydrini

Hygrotini

Hydrovatini

Hydrotrupini

Hydroporini

Bidessini

Hydrodytinae

Noterini

Hyderodini

Hydaticini

Neohaliplus

Liaphlus

Haliplidius

Hydroporini

Ozaenini

Platynitae, Platynini

Dryptitae, Galeritini

Copelatini

Eretini

Dytiscini

Vatellini

Cychrini

Coptotomini

Copelatini

Clivinini

Methlini

Colymbetini

Noterini

Lebiitae, Calophaenini

Pterostichitae, Sphodrini

Colymbetini

Broscini

Bidessini

Dineutini

Manticorini

Agabini

Agabetini

Pseudomorphini

Aciliini

Supertribe, Tribe or equivalent

SRR12339100

SRR12339101

SRR12339103

SRR12339104

SRR12339105

SRR12339106

SRR12339107

SRR12339108

SRR12339109

SRR12339110

SRR12339111

SRR12339112

SRR12339114

SRR12339115

SRR12339116

SRR12339117

SRR12339118

SRR12339119

SRR12339120

SRR12339121

SRR12339122

SRR12339123

SRR12339125

SRR12339126

SRR12339127

SRR12339128

SRR12339129

SRR12339130

SRR12339131

SRR12339132

SRR12339133

SRR12339134

SRR12339136

SRR12339137

SRR12339138

SRR12339139

SRR12339140

SRR12339141

SRR12339142

SRR12339143

SRR12339144

SRR12339050

SRR12339058

SRR12339069

SRR12339080

SRR12339091

SRR12339102

SRR12339113

SRR12339124

SRR12339135

SRR12339051

SRR12339052

NCBI-SRA

SAMN15489395

SAMN15489380

SAMN15489308

SAMN15489302

SAMN15489323

SAMN15489379

SAMN15489373

SAMN15489334

SAMN15489358

SAMN15489389

SAMN15489357

SAMN15489356

SAMN15489355

SAMN15489342

SAMN15489312

SAMN15489354

SAMN15489339

SAMN15489372

SAMN15489315

SAMN15489329

SAMN15489387

SAMN15489365

SAMN15489364

SAMN15489363

SAMN15489362

SAMN15489371

SAMN15489327

SAMN15489326

SAMN15489301

SAMN15489383

SAMN15489353

SAMN15489370

SAMN15489322

SAMN15489319

SAMN15489314

SAMN15489369

SAMN15489313

SAMN15489311

SAMN15489386

SAMN15489324

SAMN15489341

SAMN15489309

SAMN15489361

SAMN15489340

SAMN15489352

SAMN15489382

SAMN15489321

SAMN15489310

SAMN15489343

SAMN15489318

SAMN15489337

SAMN15489351

1,475,199

1,539,138

730,929

1,728,525

1,701,165

1,213,049

1,308,631

1,463,810

1,906,079

2,424,354

1,600,231

1,303,854

1,081,345

814,217

1,316,434

1,078,618

1,816,009

1,612,204

1,820,881

1,482,895

1,823,398

1,888,669

2,730,884

2,709,057

1,268,679

2,551,143

1,110,028

2,222,895

1,773,505

1,695,897

1,955,387

1,518,126

1,840,940

1,610,693

1,565,817

1,695,940

1,464,339

1,433,542

2,770,449

1,786,710

2,015,071

1,507,272

2,190,501

1,795,325

1,338,798

1,493,247

1,445,537

1,918,412

1,422,789

1,602,973

1,562,994

1,576,264

547.673

199.939

181.105

232.014

296.420

116.655

42.503

304.470

691.159

294.752

96.605

614.181

453.718

1,260,612

1,341,092

654,267

1,443,307

1,512,258

979,646

1,190,884

1,204,671

1,730,790

2,053,239

1,263,608

1,083,435

911,425

627,714

1,176,182

980,897

1,619,428

1,461,245

1,554,289

1,207,930

1,571,295

1,541,106

2,457,971

2,366,957

1,100,057

2,234,816

986,807

1,943,362

1,579,672

1,465,209

1,764,884

1,222,816

1,587,342

1,213,822

1,371,847

1,452,885

1,282,909

1,237,392

64.472

440.181

7.645

151.402

928.646

594.908

245.433

96.602

463.023

909.407

133.371

120.035

224.192

61.574

126.149

272.325

136.549

281.174

621.068

103.816

283.611

323.927

508.727

705.845

415.541

305.124

483.622

779.495

161.499

654.268

127.983

173.729

213.863

120.610

420.437

213.503

66.323

428.929

48.906

31.732

11.512

32.735

22.156

25.687

24.311

21.861

8.924

12.626

28.583

33.098

22.890

22.918

16.670

26.488

13.976

17.919

14.837

22.354

21.739

17.450

11.930

17.401

27.600

14.784

26.219

10.930

29.478

21.418

26.021

41.856

11.950

15.412

26.871

21.223

20.697

14.157

9.442

13.951

14.149

26.160

12.204

13.451

22.998

49.334

17.061

9.091

21.196

17.215

23.550

33.657

48.553

27.614

11.515

31.942

18.803

19.574

22.599

21.363

8.308

10.749

27.722

32.373

21.920

22.438

15.986

24.792

13.687

17.017

13.147

21.608

20.463

15.604

10.128

13.761

25.158

14.161

21.942

9.413

28.393

18.585

25.333

40.860

11.443

14.756

24.937

20.227

20.477

12.213

8.027

11.960

13.580

25.184

11.653

12.472

22.405

49.449

16.018

7.787

19.626

16.877

20.008

30.009

Average per base coverage Average per Average per depth of base base coverage target coverage depth of nonregions depth of target regions (Ct) assembly (Ca) (Cn)

2,491,285 1860.950

1,633,195

1,830,923

1,298,746

1,743,167

1,477,295

1,102,301

662,713

1,261,902

1,615,555

1,272,001

1,403,858

1,324,609

1,314,765

No. of sequenced No. of clean reads reads (pairs) (pairs) NCBI-Biosample

1.318

13.872

0.664

4.625

41.913

23.160

10.095

4.419

51.886

72.026

4.666

3.627

9.795

2.687

7.568

10.281

9.770

15.691

41.859

4.644

13.046

18.563

42.644

40.563

15.056

20.639

18.445

71.316

5.479

30.547

4.919

4.151

17.897

7.826

15.647

10.060

3.204

30.297

197.090

39.257

14.131

6.923

19.011

22.037

5.072

0.862

17.846

76.023

13.906

5.612

26.080

13.481

Ct / Ca

1.328

15.941

0.664

4.740

49.388

30.394

10.860

4.522

55.734

84.605

4.811

3.708

10.228

2.744

7.891

10.984

9.977

16.523

47.239

4.804

13.860

20.759

50.231

51.292

16.517

21.546

22.041

82.812

5.688

35.204

5.052

4.252

18.689

8.174

16.860

10.556

3.239

35.122

231.825

45.793

14.723

7.191

19.910

23.767

5.207

0.860

19.007

88.755

15.019

5.724

30.697

15.119

Ct / Cn

605

569

177

479

550

531

545

471

577

576

516

471

508

430

542

532

508

527

556

533

567

621

642

639

514

602

542

607

556

547

548

450

565

484

565

534

458

561

605

571

573

523

574

557

446

273

523

596

557

502

554

564

92.93%

87.40%

27.19%

73.58%

84.49%

81.57%

83.72%

72.35%

88.63%

88.48%

79.26%

72.35%

78.03%

66.05%

83.26%

81.72%

78.03%

80.95%

85.41%

81.87%

87.10%

95.39%

98.62%

98.16%

78.96%

92.47%

83.26%

93.24%

85.41%

84.02%

84.18%

69.12%

86.79%

74.35%

86.79%

82.03%

70.35%

86.18%

92.93%

87.71%

88.02%

80.34%

88.17%

85.56%

68.51%

41.94%

80.34%

91.55%

85.56%

77.11%

85.10%

86.64%

76,250

76,463

32,834

65,359

86,705

67,213

80,523

60,948

96,784

97,792

64,048

61,163

59,105

50,466

72,792

69,853

63,647

66,604

83,026

66,183

73,760

86,546

102,507

98,929

70,052

92,050

80,285

113,958

73,651

84,167

75,020

56,428

77,784

60,471

69,287

73,222

56,192

87,315

124,538

93,878

85,635

70,583

86,422

81,606

53,028

27,225

70,874

94,576

78,933

63,282

78,779

71,865

No. of Percentag orthologous e (%) of Total No. hits (genes genes of amino recovered) recovered acids

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

43

46

5

31

25

34

26

28

39

35

32

25

25

11

31

21

23

29

26

30

25

33

30

42

24

41

13

36

28

22

50

26

24

29

16

16

16

39

37

39

38

23

38

34

23

12

23

36

23

26

31

28

No. of No. of stop Xs symbols

Table S2: Overview of species used for target sequence capture of protein-coding exons with the newly designed bait set. NCBI accession numbers for the raw reads are given for each species. Total number of sequenced reads, number of reads after quality trimming and statistics of the enrichment efficiency are also provided (see description main text). Statistics of orthology assignment of the sequenced genomic libraries with Orthograph are also provided (amino-acid output of Orthograph). Note: for the species Hydrocanthus oblongus two different samples were processed and sequenced but only one of them was included in this study. All data are available on NCBI under the Bioproject number: PRJNA645047.

143

155

261

153

183

142

168

142

200

195

142

143

127

132

156

146

138

143

176

139

144

161

187

181

152

175

174

222

149

183

157

139

155

142

132

156

135

181

247

194

171

153

171

167

128

107

152

189

165

143

160

143

N50 of protein length

126

134

185

136

157

126

147

129

167

169

124

129

116

117

134

131

125

126

149

124

130

139

159

154

136

152

148

187

132

153

136

125

137

124

122

137

122

155

205

164

149

134

150

146

118

99

135

158

141

126

142

127

Average length

117

122

122

123

139.5

117

138

115

147

153.5

114.5

119

109

107.5

120

118

115.5

113

136

111

113

123

140

135

123

138.5

132.5

169

117

129

121

116

123

114

112

122

112

137

181

144

137

119

128

129

114

91

117

142.5

122

118

127

112

Median length

633

637

1136

620

643

605

643

625

743

643

568

598

633

524

643

636

493

409

540

547

676

630

608

640

626

643

643

877

643

643

639

533

584

631

505

523

557

999

779

848

664

618

643

643

407

324

626

681

643

569

571

643

Max. length

28

33

11

38

39

32

40

32

31

38

7

32

42

28

31

32

27

32

19

33

33

36

6

35

24

36

23

33

35

44

23

32

35

38

36

31

33

28

11

35

32

38

39

32

33

33

10

31

15

40

35

33

Min. length

Dytiscidae

Carabidae

Noteridae

Carabidae

Carabidae

Dytiscidae

Noteridae

Dytiscidae

Carabidae

Carabidae

Noteridae

Carabidae

Carabidae

Carabidae

Dytiscidae

Carabidae

Gyrinidae

Haliplidae

Haliplidae

Carabidae

Dytiscidae

Carabidae

Dytiscidae

Dytiscidae

Carabidae

Dytiscidae

Gyrinidae

Noteridae

Cicindelidae

Carabidae

Carabidae

Carabidae

Carabidae

Dytiscidae

Dytiscidae

Dytiscidae

Noteridae

Noteridae

Cicindelidae

Cicindelidae

Dytiscidae

Dytiscidae

Cicindelidae

53 Meridiorhantus calidus

54 Mesacanthina cribata

55 Mesonoterus laevicollis

56 Morion sp.

57 Nebria picicornis

58 Necterosoma penicillatum

59 Neohydrocoptus sp.

60 Neptosternus brevior

61 Notiobia sp.

62 Notiophilus sp.

63 Notomicrus sp.

64 Odacantha melanura

65 Omophron sp.

66 Ozaena sp.

67 Pachydrus sp.

68 Panagaeus bipustulatus

69 Patrus sp.

70 Peltodytes (Peltodytes) caesus

71 Peltodytes (Neopeltodytes) oppositus

72 Pheropsophus sp.

73 Philaccolilus sp.

74 Pinacodera sp.

75 Platambus maculatus

76 Platynectes sp.

77 Platynus sp.

78 Porhydrus lineatus

79 Porrorhynchus sp.

80 Suphisellus (Pronoterus) semipunctatus

81 Pseudoxicheila tarsalis

82 Pterostichus burmeisteri

83 Scarites subterraneus

84 Siagona sp.

85 Sphallomorpha suturalis

86 Sternhydrus atratus

87 Sternhydrus scutellaris

88 Stictotarsus duodecimpustulatus

89 Suphisellus gibbulus

90 Suphisellus tenuicornis

91 Tetracha carolina

92 Therates labiatus

93 Thermonectus basillaris

94 Thermonectus margineguttatus

95 Tricondyla aptera

Dytiscinae

Dytiscinae

Noterinae

Noterinae

Hydroporinae

Cybistrinae

Cybistrinae

Harpalinae

Siagoninae

Scaritinae

Harpalinae

Noterinae

Gyrininae

Hydroporinae

Harpalinae

Agabinae

Agabinae

Harpalinae

Laccophillinae

Brachininae

Haliplinae

Haliplinae

Gyrininae

Harpalinae

Hydroporinae

Paussinae

Omophroninae

Harpalinae

Noterinae

Nebriinae

Harpalinae

Laccophillinae

Noterinae

Hydroporinae

Nebriinae

Harpalinae

Noterinae

Colymbetinae

Collyridini, Tricondylina

Aciliini

Aciliini

Cicindelini, Theratina

Megacephalini

Noterini

Noterini

Hydroporini

Cybistrini

Cybistrini

Pseudomorphini

Siagonini

Scaritini

Pterostichitae, Pterostichini

Oxycheilini, Oxychilina

Pronoterini

Dineutini

Hydroporini

Platynitae, Platynini

Hydrotrupini

Agabini

Lebiitae, Lebiini

Laccophilini

Brachinini

Orectochilini

Chleniitae, Panagaeini

Hyphydrini

Ozaenini

Omophronini

Lebiitae, Odacanthini

Notomicrini

Nebriini

Harpalitae, Harpalini

Laccophilini

Neohydrocoptini

Hydroporini

Nebriini

Pterostichitae, Morionini

Noterini

Cicindelini, Prothymina

Colymbetini

SRR12339053

SRR12339054

SRR12339055

SRR12339056

SRR12339057

SRR12339059

SRR12339060

SRR12339061

SRR12339062

SRR12339063

SRR12339064

SRR12339065

SRR12339066

SRR12339067

SRR12339068

SRR12339070

SRR12339071

SRR12339072

SRR12339073

SRR12339074

SRR12339075

SRR12339076

SRR12339077

SRR12339078

SRR12339081

SRR12339079

SRR12339082

SRR12339083

SRR12339084

SRR12339085

SRR12339086

SRR12339087

SRR12339088

SRR12339089

SRR12339090

SRR12339092

SRR12339093

SRR12339094

SRR12339095

SRR12339096

SRR12339097

SRR12339098

SRR12339099

SAMN15489305

SAMN15489391

SAMN15489320

SAMN15489304

SAMN15489346

SAMN15489393

SAMN15489316

SAMN15489368

SAMN15489331

SAMN15489330

SAMN15489336

SAMN15489306

SAMN15489345

SAMN15489338

SAMN15489325

SAMN15489390

SAMN15489385

SAMN15489360

SAMN15489317

SAMN15489303

SAMN15489367

SAMN15489344

SAMN15489381

SAMN15489378

SAMN15489359

SAMN15489347

SAMN15489384

SAMN15489335

SAMN15489377

SAMN15489376

SAMN15489348

SAMN15489333

SAMN15489349

SAMN15489388

SAMN15489328

SAMN15489307

SAMN15489350

SAMN15489332

SAMN15489366

SAMN15489375

SAMN15489392

SAMN15489374

SAMN15489394

1,650,429

1,846,851

2,491,394

1,576,403

1,433,030

2,049,206

1,952,185

1,659,600

1,810,908

1,493,014

1,445,528

1,450,414

1,501,382

1,903,256

1,388,885

1,789,803

1,604,396

1,351,638

1,737,438

1,472,236

2,106,958

1,699,972

2,397,075

1,548,712

1,886,465

1,340,576

1,608,735

2,043,803

804,152

2,335,528

2,971,071

1,024,622

1,751,584

1,718,030

1,584,000

1,558,597

1,474,981

1,311,993

1,210,450

970,629

1,772,500

1,444,865

1,845,516

276.182

817.633

245.323

285.128

238.113

266.421

130.661

397.181

132.751

263.525

519.492

172.154

1,411,203

1,591,903

2,181,339

1,438,276

1,227,851

1,831,814

1,634,373

1,462,934

1,668,154

1,280,011

980,824

1,254,299

1,257,010

1,628,733

1,280,020

1,417,133

1,239,356

1,204,643

1,499,585

1,288,752

1,840,616

1,517,882

2,181,057

1,333,876

1,715,827

1,128,334

1,388,097

1,643,688

632,912

2,085,232

63.331

143.850

788.522

111.868

404.714

547.022

206.998

91.818

245.587

276.288

492.590

387.087

290.951

137.420

236.504

271.938

149.935

146.408

876.327

198.243

774.915

648.978

603.801

440.383

963.459

549.174

630.939

390.336

47.151

602.793

2,614,000 1764.372

887,304

1,493,215

1,541,055

1,453,974

1,330,957

1,310,717

1,153,769

1,020,650

850,325

1,242,472

1,238,473

1,543,715

24.854

28.384

42.701

14.366

28.388

8.507

6.716

31.056

18.020

36.526

25.716

21.102

10.072

17.317

22.052

10.705

23.444

29.140

20.343

23.199

22.239

28.403

10.231

25.813

21.159

27.371

16.632

12.106

8.350

20.847

16.940

17.435

9.384

17.747

13.302

14.442

15.435

12.084

22.067

21.949

12.514

37.933

17.247

24.648

26.906

33.861

13.988

26.255

7.808

6.553

30.518

16.779

33.188

22.103

18.678

9.466

16.842

20.906

10.149

22.632

28.346

16.072

21.818

18.907

24.639

9.424

23.159

18.327

24.268

14.843

10.651

8.133

19.024

14.151

15.829

8.682

17.073

12.458

13.781

14.510

11.746

20.008

20.975

11.552

29.676

16.661

9.982

2.548

5.068

18.466

7.787

14.257

64.306

30.822

2.956

13.628

7.564

19.155

18.344

28.887

7.936

10.725

25.403

6.395

5.024

43.078

8.545

34.844

22.849

59.017

17.060

45.535

20.064

37.935

32.243

5.647

28.915

104.156

15.841

87.134

13.824

21.435

16.488

17.261

10.812

17.999

6.048

21.058

13.695

2.569

5.346

23.287

7.997

15.415

70.062

31.590

3.009

14.637

8.325

22.287

20.724

30.736

8.159

11.313

26.794

6.625

5.165

54.526

9.086

40.986

26.339

64.068

19.016

52.570

22.630

42.507

36.648

5.797

31.686

124.685

17.448

94.178

14.369

22.886

17.278

18.361

11.124

19.851

6.329

22.812

17.505

10.333

470

635

638

534

512

566

445

489

542

498

514

535

539

553

545

543

515

485

586

548

577

538

544

545

558

515

523

579

412

594

604

534

588

510

545

516

529

494

535

517

514

602

540

72.20%

97.54%

98.00%

82.03%

78.65%

86.94%

68.36%

75.12%

83.26%

76.50%

78.96%

82.18%

82.80%

84.95%

83.72%

83.41%

79.11%

74.50%

90.02%

84.18%

88.63%

82.64%

83.56%

83.72%

85.71%

79.11%

80.34%

88.94%

63.29%

91.24%

92.78%

82.03%

90.32%

78.34%

83.72%

79.26%

81.26%

75.88%

82.18%

79.42%

78.96%

92.47%

82.95%

56,471

84,862

87,717

68,553

70,898

99,222

68,549

64,839

75,037

62,376

68,412

75,824

70,909

84,413

77,767

87,649

65,881

61,681

92,385

76,705

84,302

80,971

82,044

84,753

86,425

76,167

76,739

86,581

51,670

90,638

103,319

71,634

119,546

77,089

84,056

73,689

84,795

72,652

80,874

75,032

81,750

78,180

72,096

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

19

40

34

25

20

25

17

31

89

32

22

35

20

22

22

32

17

20

38

35

24

27

24

31

38

19

27

45

22

41

40

18

26

29

36

23

33

25

28

26

16

27

36

134

154

159

142

165

204

174

151

163

143

149

165

148

172

166

189

141

139

192

160

172

178

177

177

178

170

166

170

134

177

195

152

239

172

180

161

184

166

176

169

184

151

150

120

133

137

128

138

175

154

132

138

125

133

141

131

152

142

161

127

127

157

139

146

150

150

155

154

147

146

149

125

152

171

134

203

151

154

142

160

147

151

145

159

129

133

108

117

118

117

117

156

135

119

120

113

119.5

127

120

137

130

143

116

115

139

124

127

135

130

139

135

133

129

138

114

136.5

148.5

118

183.5

139

140

130

145

123

133

125

142.5

116

120

642

643

801

643

643

666

720

643

643

643

681

643

531

629

643

575

608

643

470

617

802

643

639

805

787

448

619

642

550

786

1045

533

941

735

643

591

684

761

643

670

751

643

643

40

37

32

36

39

31

31

28

29

34

30

33

32

11

8

26

44

35

33

30

30

32

29

44

13

40

27

29

12

14

30

32

31

24

36

32

9

6

30

38

17

11

32

50

40

30

40

20

20

2

3

4

5

6

7

120

120

120

120

120

120

120

Bait length (bp)

1

2

2

3

3

3

4

Number of baits per bait region

120

140

160

180

200

220

240

0.15

0.15

0.15

0.15

0.15

0.15

0.15

Length of bait regions (bp) Cluster threshold

725

673

614

552

487

426

381

No. of genes

1,210

1,023

886

742

616

519

453

165,330

141,002

120,376

103,233

89,084

77,256

67,199

651

601

531

460

389

327

280

1040*

673

740

605

479

388

325

122,037

98,219

78,999

63,093

50,849

41,279

34,132

Results of Baitfilter analyses after removing baits with hits to multiple genomic regions in the genome of Bembidion sp. nr. transversale No. of CDS No. of CDS No. of genes after features after No. of bait regions features No. of bait regions filtering filtering after filtering

Results of Baitfisher for each of the different tiling design experiments

*Note: Because the bait kit we used only allowed a maximum of size of 6Mbp for the baits, not all exons from experiment no. 7 were included in the final bait set (i.e. 923 exons out of a potential total of 1040 exons were targeted)

40

Bait offset

1

No. of tiling design experiment

Parameters used for different tiling design experiments with BaitFisher

Table S3: Summarized results from the different tiling design experiments with Baitfisher. The same set of species was specified in all experiments (see File S2).

2013

2017

2017

2011

2013

2017

2017

2007

2017

2018

2013

2017

2017

2017

2017

2017

2006

2017

2013

2017

2017

2010

2017

2017

2017

2017

2017

Clivina sp.

Copelatus caelatipennis

Coptotomus sp.

Cychrus sp.

Derovatellus peruanus

Dytiscus marginalis

Eretes griseus

Exocelina sp.

Galerita sp.

Glyptolenus sp.

Goniotropis sp.

Graptodytes pictus

Haliplus (Haliplidius) confinis

Haliplus (Liaphlus) laminatus

Haliplus (Neohaliplus) lineatocollis

Hydaticus pacificus

Hyderodes shuckardi

Hydrocanthus oblongus

Hydrodytes opalinus

Hydroglyphus geminus

Hydroporus erythrocephalus

Hydrotrupes palpalis

Hydrovatus fasciatus

Hygrotus (Leptolambus) impressopunctatus

Hyphydrus ovatus

Ilybius fenestratus

Laccodytes sp.

1980s

Bunites distigma

2017

2017

Brychius elevatus

Celina imitatrix

2017

Broscus cephalotes

2017

2017

Bidessus unistriatus

Caperhantus cicurius

2012

Andogyrus sp.

2017

1994

Amblycheila cylindriformis

Canthydrus sp.

2010

Algophilus lathridioides

2017

2017

Agabus undulatus

Calophaena bicincta

2017

Agabetes acuductus

2017

1990

Adelotopus paroensis

Calathus sp.

2017

Year collected

Acilius canaliculatus

Species

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

kept in 70% or less ethanol for >15 years

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 70% ethanol

pinned collection specimen, most likely initially preserved with ethyl acetate

kept in low grade ethanol for >5 years

preserved in 96% ethanol

preserved in 96% ethanol

pinned collection specimen, most likely initially preserved with ethyl acetate

preserved in 96% ethanol

Preservation

x (dry collection)

x

x

x

x

x

x

x

x (dry collection)

x

x

x

x

x

x

x

x

x

x

x

x

x

x (dry collection)

x

x

x

x

x

x

x

x

x

x

x

x

x (dry collection)

x

x

x

X

Existing voucher (DNA or tissue)

Table S4: Collection information for the processed samples used for hybrid enrichment (voucher deposited at Zoological State Collections Munich, Germany). Collection ID

Adephaga_091

Adephaga_026

Adephaga_033

Adephaga_034

Adephaga_077

Adephaga_085

Adephaga_064

Adephaga_028

Adephaga_087

Adephaga_057

Adephaga_078

Adephaga_079

Adephaga_017

Adephaga_015

Adephaga_016

Adephaga_023

Adephaga_069

Adephaga_067

Adephaga_065

Adephaga_030

Adephaga_040

Adephaga_009

Adephaga_088

Adephaga_082

Adephaga_022

Adephaga_061

Adephaga_073

Adephaga_058

Adephaga_080

Adephaga_076

Adephaga_063

Adephaga_047

Adephaga_012

Adephaga_014

Adephaga_052

Adephaga_027

Adephaga_008

Adephaga_097

Adephaga_018

Adephaga_037

Adephaga_021

Adephaga_096

Adephaga_011

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

Biomaterial provider

2017

2010

2013

1990

2012

2007

2004

2014

2017

2017

2013

2016

2013

2017

2014

2017

2009

2017

2017

2017

2015

2017

2013

2013

2017

2017

2017

2017

2013

2013

2017

2017

2007

2017

2017

2017

2016

2017

2017

2017

2010

1998

2006

2006

2017

2017

Laccophilus poecilus

Laccornis oblongus

Lachnophorini sp.

Lancetes sp.

Loricera pilicornis

Macrogyrus sp.

Manticora latipennis

Matus sp.

Megadytes sp.

Meridiorhantus calidus

Mesacanthina cribata

Mesonoterus laevicollis

Morion sp.

Nebria picicornis

Necterosoma penicillatum

Neohydrocoptus sp.

Neptosternus brevior

Notiobia sp.

Notiophilus sp.

Notomicrus sp.

Odacantha melanura

Omophron sp.

Ozaena sp.

Pachydrus sp.

Panagaeus bipustulatus

Patrus sp.

Peltodytes (Peltodytes) caesus

Peltodytes (Neopeltodytes) oppositus

Pheropsophus sp.

Philaccolilus sp.

Pinacodera sp.

Platambus maculatus

Platynectes sp.

Platynus sp.

Porhydrus lineatus

Porrorhynchus sp.

Suphisellus (Pronoterus) semipunctatus

Pseudoxicheila tarsalis

Pterostichus burmeisteri

Scarites subterraneus

Siagona sp.

Sphallomorpha suturalis

Sternhydrus atratus

Sternhydrus scutellaris

Stictotarsus duodecimpustulatus

Suphisellus gibbulus

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

pinned collection specimen, most likely initially preserved with ethyl acetate

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

pinned collection specimen, most likely initially preserved with ethyl acetate

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

x

x (dry collection)

x

X

x

x

x

x

x (dry collection)

x

x

x

x

x

x

x (dry collection)

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x (dry collection)

x

x

x

x (dry collection)

x

x

Adephaga_059

Adephaga_035

Adephaga_093

Adephaga_102

Adephaga_099

Adephaga_053

Adephaga_045

Adephaga_046

Adephaga_066

Adephaga_006

Adephaga_010

Adephaga_024

Adephaga_051

Adephaga_029

Adephaga_025

Adephaga_048

Adephaga_092

Adephaga_071

Adephaga_020

Adephaga_019

Adephaga_013

Adephaga_038

Adephaga_107

Adephaga_070

Adephaga_044

Adephaga_054

Adephaga_055

Adephaga_049

Adephaga_068

Adephaga_105

Adephaga_056

Adephaga_101

Adephaga_043

Adephaga_074

Adephaga_004

Adephaga_041

Adephaga_003

Adephaga_001

Adephaga_090

Adephaga_098

Adephaga_031

Adephaga_083

Adephaga_095

Adephaga_072

Adephaga_081

Adephaga_075

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

2016

2017

2017

2017

2016

2017

Suphisellus tenuicornis

Tetracha carolina

Therates labiatus

Thermonectus basillaris

Thermonectus margineguttatus

Tricondyla aptera

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

preserved in 96% ethanol

x

x

x

x

x

x

Adephaga_036

Adephaga_002

Adephaga_060

Adephaga_062

Adephaga_042

Adephaga_007

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

M. Balke

0.00282

Dytiscidae

Gyrinidae

Haliplidae

Noteridae

Cicindelidae

Dytiscidae

Gyrinidae

Cicindelidae

Dytiscidae

Gyrinidae

Cicindelidae

Gyrinidae

Gyrinidae

Carabidae

Carabidae

Carabidae

Carabidae

Carabidae

Noteridae

Noteridae

Noteridae

Haliplidae

Haliplidae

Haliplidae

Dytiscidae

Dytiscidae

Cicindelidae

0.78790

0.40570

0.89130

0.04242

0.00038

0.00058

0.04848

0.00008

0.00062

0.04282

0.09944

0.07101

0.00004

p-value (Ct / Cn)

Cicindelidae

Paired test

0.91430

0.40570

0.68790

0.04242

0.00024

0.00117

0.04848

0.00004

0.00133

0.02713

0.06624

0.07138

0.00003

0.00404

p-value (Ct / Ca)

Table S5: Summarized results of the Mann-Whitney-Wilcoxon tests for the enrichment statistics in different pairs of families of Adephaga (see also Fig. 2, Fig. S4).

23,096

20,782

64,056

22,380

Agabetes acuductus

Agabus undulatus

Algophilus lathridioides

Amblycheila cylindriformis

51,544

17,391

43,622

38,082

Brychius elevatus

Bunites distigma

Calathus sp.

Calophaena bicincta

16,989

35,135

23,426

34,156

12,119

21,540

7,306

Hyderodes shuckardi

Hydrocanthus oblongus

Hydrodytes opalinus

Hydroglyphus geminus

Hydroporus erythrocephalus

Hydrotrupes palpalis

Hydrovatus fasciatus

11,819

25,814

Hydaticus pacificus

27,230

31,603

Haliplus (Neohaliplus) lineatocollis

Loricera pilicornis

72,706

Haliplus (Liaphlus) laminatus

Lancetes sp.

51,834

Haliplus (Haliplidius) confinis

20,702

14,791

Graptodytes pictus

Lachnophorini sp.

67,211

Goniotropis sp.

17,468

13,366

Glyptolenus sp.

66,510

72,529

Galerita sp.

Laccornis oblongus

19,504

Exocelina sp.

Laccophilus poecilus

27,077

Eretes griseus

56,800

25,412

Dytiscus marginalis

15,641

10,588

Derovatellus peruanus

Laccodytes sp.

46,647

Cychrus sp.

Ilybius fenestratus

20,477

Coptotomus sp.

11,087

18,944

Copelatus caelatipennis

Hyphydrus ovatus

23,744

Clivina sp.

13,318

18,217

Celina imitatrix

Hygrotus (Leptolambus) impressopunctatus

30,306

Caperhantus cicurius

100,981

38,321

Broscus cephalotes

Canthydrus sp.

14,175

Bidessus unistriatus

7,033

25,652

Adelotopus paroensis

Andogyrus sp.

15,151

Acilius canaliculatus

Species / Target enrichment assembly

26,273

11,806

18,914

16,733

64,560

56,659

15,308

9,725

13,172

7,231

21,093

11,916

34,000

23,353

34,523

16,319

25,251

30,902

71,452

51,028

14,276

59,757

13,134

70,845

17,851

26,888

23,701

10,507

46,062

20,246

18,697

22,733

17,990

29,931

100,317

37,862

43,161

14,520

49,918

37,350

13,920

5,792

21,987

63,455

20,581

22,793

24,889

14,982

No. of contigs No. of never suspected assemble as being d contigs contaminated

957

13

1,788

735

1,950

141

333

1,362

146

75

447

203

156

73

612

670

563

701

1,254

806

515

7,454

232

1,684

1,653

189

1,711

81

585

231

247

1,011

227

375

664

220

461

2,871

1,626

971

255

1,241

393

601

201

303

763

169

No. of putatively contaminated contigs (suspects)

715

7

524

405

1,017

86

167

666

90

58

309

116

108

56

352

182

206

283

602

511

285

1,852

96

802

723

85

702

59

218

91

81

308

126

91

352

106

182

1,497

660

318

153

333

76

249

93

140

149

100

188

0

1,194

219

731

31

67

368

8

1

69

16

12

3

226

457

332

372

460

176

57

5,555

121

824

772

80

924

9

268

110

33

675

57

268

245

94

205

1,268

782

606

38

875

64

64

72

103

155

32

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

2

0

0

0

1

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

No. of No. of No. of contigs contigs with clean with low high contigs coverage coverage

24

3

38

64

167

17

52

212

20

5

43

31

20

8

18

3

9

18

129

62

102

29

4

26

75

12

31

4

57

7

13

4

12

6

54

2

31

26

117

12

23

4

17

97

13

23

41

21

30

3

32

47

35

7

47

116

28

11

26

40

16

6

16

28

16

28

63

57

69

18

11

32

82

12

54

8

42

23

120

24

32

10

13

18

43

80

67

35

41

29

236

191

23

37

418

15

No. of contigs of No. of dubious contaminate origin d contigs

CroCo results

Table S6: Summarized statistics of the cross-contamination checks for each species (hybrid-enrichment data) and summarized length statistics of the clean assemblies.

99.11

99.94

93.89

98.11

98.59

99.90

98.93

93.72

99.57

99.76

99.35

99.28

99.85

99.92

99.25

97.12

98.61

98.67

99.10

99.43

98.44

91.66

98.98

98.78

95.23

99.61

96.02

99.79

99.21

99.31

99.12

97.03

99.44

99.06

99.69

99.70

99.36

92.09

98.12

98.29

99.28

87.08

98.58

99.45

99.48

99.29

97.60

99.54

0.69

0.00

5.76

1.25

1.09

0.05

0.42

3.31

0.06

0.01

0.32

0.13

0.03

0.01

0.64

2.68

1.28

1.17

0.63

0.33

0.38

8.26

0.90

1.13

3.95

0.29

3.63

0.08

0.57

0.53

0.17

2.84

0.31

0.88

0.24

0.24

0.46

7.29

1.51

1.58

0.26

12.44

0.28

0.09

0.34

0.44

0.60

0.21

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.01

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

Percentage Percentage (%) of contigs Percentage (%) (%) of clean with low of contigs with contigs coverage high coverage

0.08

0.02

0.18

0.36

0.25

0.02

0.33

1.91

0.15

0.06

0.19

0.25

0.05

0.03

0.05

0.01

0.03

0.05

0.17

0.11

0.68

0.04

0.02

0.03

0.38

0.04

0.12

0.03

0.12

0.03

0.06

0.01

0.06

0.01

0.05

0.00

0.07

0.14

0.22

0.03

0.16

0.05

0.07

0.15

0.06

0.09

0.15

0.13

Percentage of contigs of dubious origin

0.11

0.02

0.15

0.26

0.05

0.01

0.30

1.04

0.21

0.15

0.12

0.33

0.04

0.02

0.04

0.16

0.06

0.08

0.08

0.10

0.46

0.02

0.08

0.04

0.42

0.04

0.21

0.07

0.09

0.11

0.63

0.10

0.17

0.03

0.01

0.04

0.09

0.46

0.12

0.09

0.28

0.41

1.05

0.29

0.11

0.16

1.62

0.09

26,988

11,813

19,438

17,138

65,577

56,745

15,475

10,391

13,262

7,289

21,402

12,032

34,108

23,409

34,875

16,501

25,457

31,185

72,054

51,539

14,561

61,609

13,230

71,647

18,574

26,973

24,403

10,566

46,280

20,337

18,778

23,041

18,116

30,022

100,669

37,968

43,343

16,017

50,578

37,668

14,073

6,125

22,063

63,704

20,674

22,933

25,038

15,082

578

466

519

501

549

534

574

568

571

639

483

578

508

539

498

453

475

504

548

526

549

521

585

568

525

493

482

566

518

448

510

549

578

472

598

601

549

510

503

516

558

357

730

492

580

516

467

499

Percentage (%) of putatively contaminated No. of clean Mean contig contigs contigs length

433

400

416

414

465

456

441

440

449

465

419

444

429

428

430

401

414

415

457

437

434

448

455

481

425

433

416

424

438

400

422

439

442

424

506

473

443

413

436

431

430

347

516

437

459

422

396

409

Median contig length

Clean assembly statistics

589

446

551

478

545

535

590

566

591

673

470

588

491

517

484

449

470

485

543

507

560

530

613

581

522

492

475

586

504

432

503

555

571

469

623

610

534

488

493

493

547

397

864

490

610

496

488

487

N50 of contig lengths

15,435

15,935

10,360

9,650

10,095

16,161

11,568

11,265

9,822

6,683

14,583

9,022

8,292

9,562

16,013

9,734

11,261

12,775

11,618

13,589

8,188

16,503

8,340

8,482

11,233

15,930

12,364

8,798

13,174

9,017

10,545

10,623

14,645

11,129

14,637

8,688

15,282

11,036

9,832

11,239

7,261

8,859

19,968

11,504

12,564

13,975

8,536

15,211

Max. contig length

37,243

26,345

63,641

23,175

32,196

18,577

29,228

14,662

19,071

42,153

25,353

30,580

36,840

21,940

13,651

Peltodytes (Neopeltodytes) oppositus

Pheropsophus sp.

Philaccolilus sp.

Pinacodera sp.

Platambus maculatus

Platynectes sp.

Platynus sp.

Porhydrus lineatus

Porrorhynchus sp.

Suphisellus (Pronoterus) semipunctatus

Pseudoxicheila tarsalis

Pterostichus burmeisteri

Scarites subterraneus

Siagona sp.

Sphallomorpha suturalis

19,604

19,897

Peltodytes (Peltodytes) caesus

Tricondyla aptera

28,916

Patrus sp.

22,541

52,533

Panagaeus bipustulatus

Thermonectus margineguttatus

14,651

Pachydrus sp.

24,332

40,889

Ozaena sp.

29,215

63,799

Omophron sp.

Thermonectus basillaris

15,053

Odacantha melanura

Therates labiatus

59,413

Notomicrus sp.

18,167

30,797

Notiophilus sp.

68,301

32,881

Notiobia sp.

Tetracha carolina

31,168

Neptosternus brevior

Suphisellus tenuicornis

36,044

Neohydrocoptus sp.

69,300

22,201

Necterosoma penicillatum

Suphisellus gibbulus

16,808

Nebria picicornis

14,361

13,906

Morion sp.

Stictotarsus duodecimpustulatus

28,518

Mesonoterus laevicollis

11,807

11,796

Mesacanthina cribata

26,440

31,081

Meridiorhantus calidus

Sternhydrus scutellaris

11,736

Sternhydrus atratus

15,450

Megadytes sp.

9,254

20,010

Matus sp.

Manticora latipennis

Macrogyrus sp.

18,625

20,536

23,057

28,761

16,179

58,315

59,663

13,150

24,123

9,527

13,307

21,684

36,762

29,925

22,411

40,949

17,143

13,434

28,538

17,842

30,905

20,978

63,513

20,835

36,798

17,236

28,812

50,148

14,496

38,924

62,398

14,862

59,303

30,324

32,649

30,648

34,269

22,030

15,616

13,189

28,342

11,726

29,585

9,868

15,319

9,228

18,192

979

2,005

1,275

454

1,988

9,986

9,637

1,211

2,317

2,280

344

256

78

655

2,942

1,204

1,928

1,228

690

735

1,291

2,197

128

5,510

445

2,661

104

2,385

155

1,965

1,401

191

110

473

232

520

1,775

171

1,192

717

176

70

1,496

1,868

131

26

1,818

354

767

386

113

601

6,314

6,610

486

905

1,101

139

103

42

267

589

674

560

712

286

361

559

810

68

2,272

141

1,107

67

856

60

620

679

166

83

113

106

301

416

95

424

229

81

47

606

733

79

12

677

578

997

615

325

1,328

746

564

590

423

444

54

119

9

345

2,325

382

1,302

195

345

343

640

1,365

33

3,121

284

1,488

6

1,490

88

1,296

665

1

3

334

110

135

1,315

38

742

453

14

8

852

1,023

31

1

988

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

19

148

115

3

8

2649

2259

84

786

634

7

12

8

9

11

87

28

196

35

8

62

8

17

33

7

57

16

20

3

14

43

11

12

5

7

50

26

21

8

8

39

6

5

60

6

1

83

28

93

159

13

51

277

204

51

203

101

144

22

19

34

17

60

38

124

24

23

30

14

10

84

13

9

15

19

4

35

14

13

12

21

9

34

18

17

18

27

42

9

33

52

15

12

70

96.81

94.50

96.34

98.83

92.36

94.62

95.63

94.95

94.65

90.01

98.49

99.30

99.90

98.73

90.71

98.74

92.82

96.48

98.61

97.98

97.72

94.01

99.90

87.70

99.18

92.18

99.87

97.08

99.35

96.71

98.86

99.83

99.95

98.83

99.61

99.29

96.22

99.65

95.43

96.49

99.66

99.80

97.13

90.32

99.66

99.84

94.29

2.94

4.42

2.52

1.11

7.30

1.09

0.81

4.10

1.59

3.76

0.39

0.54

0.02

1.12

9.17

0.90

6.82

1.32

1.18

1.84

1.98

5.88

0.05

11.84

0.76

7.47

0.02

2.83

0.60

3.16

1.04

0.00

0.00

1.08

0.33

0.43

3.64

0.17

4.41

3.25

0.04

0.06

2.74

8.71

0.20

0.01

4.93

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.09

0.65

0.47

0.01

0.04

3.87

3.25

0.58

2.97

5.36

0.05

0.05

0.02

0.02

0.04

0.20

0.14

1.33

0.11

0.04

0.19

0.03

0.02

0.12

0.01

0.28

0.05

0.03

0.02

0.03

0.06

0.07

0.02

0.01

0.02

0.16

0.07

0.09

0.04

0.05

0.13

0.05

0.01

0.51

0.03

0.01

0.41

0.14

0.41

0.65

0.04

0.28

0.40

0.29

0.35

0.76

0.85

1.05

0.10

0.05

0.11

0.06

0.14

0.19

0.84

0.08

0.12

0.09

0.06

0.01

0.31

0.03

0.04

0.05

0.03

0.02

0.08

0.02

0.08

0.02

0.06

0.02

0.10

0.04

0.07

0.10

0.19

0.14

0.07

0.10

0.44

0.09

0.12

0.34

18,979

21,303

23,443

28,874

16,780

64,629

66,273

13,636

25,028

10,628

13,446

21,787

36,804

30,192

23,000

41,623

17,703

14,146

28,824

18,203

31,464

21,788

63,581

23,107

36,939

18,343

28,879

51,004

14,556

39,544

63,077

15,028

59,386

30,437

32,755

30,949

34,685

22,125

16,040

13,418

28,423

11,773

30,191

10,601

15,398

9,240

18,869

436

534

472

452

514

505

553

499

511

495

475

577

492

480

605

523

509

560

524

510

494

490

567

513

514

511

506

487

511

586

551

537

607

594

609

596

535

474

617

599

653

477

476

450

488

520

1,615

428

416

403

429

415

471

437

425

427

407

432

425

418

438

432

437

438

425

422

422

424

429

447

421

420

417

423

434

469

452

446

448

485

460

449

429

413

482

429

465

417

400

408

410

420

689

392

445

530

469

457

494

491

552

488

495

481

456

568

485

462

586

511

496

575

510

494

488

477

600

506

509

522

505

478

499

604

576

538

618

605

639

579

523

484

644

619

739

457

455

445

501

507

3,227

11,608

15,968

14,009

16,030

13,833

16,412

7,526

14,529

15,811

16,636

12,302

27,098

8,715

14,169

164,411

16,023

11,948

11,103

6,533

16,089

10,583

12,654

14,638

17,129

13,614

18,784

15,259

8,621

13,697

8,232

14,165

23,311

11,055

19,704

11,585

10,515

11,590

16,556

14,797

20,868

13,311

10,695

16,526

10,096

12,033

13,466

267,990

14.9879553005658 14.9095415607033 14.6262646776515 14.0169271717379 13.7453785344514 12.9845136243213 12.5516818518748 12.2625950880255 12.1261851428834 11.0016461065822 10.1081691569445 9.23998836614435 8.56925720515656 8.50467351425688 8.42795722618228

Noterus clavicornis

Bidessus unistriatus

Peltodytes caesus

Hydroglyphus geminus

Derovatellus peruanus

Mesacanthina cribata

Canthydrus sp.

Pseudoxicheila tarsalis

Peltodytes oppositus

Porrorhynchus sp.

Cicindela hybrida

Andogyrus sp.

Hygrotus (Leptolambus) impressopunctatus

Hyphydrus ovatus

Therates labiatus

4.61849225302622

15.9976172935927

Macrogyrus sp.

Hydrovatus sp.

16.2812441570849

5.2603106696089

16.5908715138856

Gyrinus marinus

4.63082083280957

17.0917760376562

Hydrocanthus oblongus

Tetracha carolina

17.8361334480708

Clinidium baldufi

Tricondyla aptera

18.4343336403483

Manticora latipennis

20.086940972042

Mesonoterus laevicollis

Neohydrocoptus sp.

21.0047843064449

23.1634883014588

Dineutus sp.

Suphisellus tenuicornis

23.5273688528818

21.9476118164781

28.0875072726362

Suphisellus (Pronoterus) semipunctatus

21.0625551931415

35.5918036760022

Notomicrus sp.

Suphisellus gibbulus

52.5222714837805

Patrus sp.

Pachydrus sp.

81.8293488629726

Priacma serrata

LB score

Micromalthus debilis

Species

Table S7: LB-score statistics of each species in supermatrix G based on the tree inferred under the SHETU model.

1.62364890946678

Hydroporus erythrocephalus.

-5.50246986082786 -5.53795566066612 -5.55048753133522 -5.89915978514883 -6.15294705032785

Haliplus laminatus

Calophaena bicincta

Haliplus confinis

-4.1377474171876

Copelatus caelatipennis

Ozaena sp.

-4.04777931386015

Panagaeus bipustulatus

Scarites subterraneus

-3.59625411767508

Morion sp.

-5.33529238277406

-3.39165167752329

Odacantha melanura

-5.03450592251712

-3.18355006468255

Siagona sp.

Cychrus sp.

-2.93680802725671

Laccornis oblongus

Pheropsophus sp.

-2.74719850897069

Notiobia sp.

-4.21832565846587

-2.41460947784844

Haliplus lineatocollis

-1.66343455684053

Brychius elevatus

Galerita sp. -1.03862428904037

-0.902693888635164

Nebria picicornis

Adelotopus paroensis

-0.397318319569817

Eretes griseus

Celina imitatrix

0.552158453354612 -0.255889253109809

Bembidion corgenoma

1.56172563603556

1.83842247486454

Philaccolilus sp.

0.697554609830475

2.00321504504037

Porhydrus lineatus

Pinacodera sp.

2.07604755068354

Neptosternus sp.

Notiophilus sp.

2.14975525611938 2.10280584058766

Hydrodytes opalinus

2.42665718053103

Laccodytes sp.

Graptodytes pictus

2.7650535680152

2.86842991567915

Algophilus lathridioides

2.55131110143494

3.40288567571252

Stictotarsus duodecimpustulatus

Laccophilus poecilus

3.55377854622276

Loricera pilicornis

Sphaliomorpha suturalis

4.18652492550238

Necterosoma penicillatum

-14.287415643254

-14.8461912820432 -14.9856870011585 -15.3084359566286

Matus sp.

Ilybius fenestratus

Broscus cephalotes

-13.8944029708267

Sternhydrus atratus

-14.8307056656241

-13.1448225043927

Elaphrus aureus

Platambus maculatus

-12.3913476059428

Amphizoa insolens

Agabetes acuductus

-12.2111425723117

-11.5835538654435

Sternhydrus scutellaris

-12.3871909509032

-11.3350674309252

Trachypachus gibbsii

Coptotomus sp.

-11.2261819104233

Platynus sp.

Amphizoa lecontei

-11.1116265535797 -11.1807823954766

-10.9461426825717

Clivina sp.

Liopterus haemorrhoidalis

-10.9292185965262

Megadytes sp.

Cybister lateralimarginalis

-9.6687544591983 -10.4889506742851

Calathus sp.

-9.48225910086887

Hygrobia nigra

Lancetes sp.

-9.08444551105834

-8.88398442684992

Hygrobia hermanni

-9.10812530864255

-8.64422426667182

Chlaenius tricolor

Sinaspidytes wrasei

-8.33851823087746

Glyptolenus sp.

Acilius canaliculatus

-8.17266250905532

-7.36803497351586

Thermonectus basillaris

-7.92106642129098

-7.30510043127757

Pterostichus burmeisteri

Exocelina sp.

-7.07418993055707

Carabus granulatus

Lachnophorini sp.

-6.92382739240991

Haliplus fluviatilis

-7.47367304119996

-6.77185663026029

Pogonus chalceus

-7.81985616884988

-6.54023200108131

Thermonectus intermedius

Goniotropis sp.

-6.37474826198299

Amblycheila cylindriformis

Omophron sp.

-6.30657933159926

Thermonectus margineguttatus

-16.1707226151063 -16.1919408399064 -16.4135486272736 -16.5091780230715 -16.6117986302361 -17.8199723143495 -17.8995553967374 -19.979759209938 -20.008885139661

Bunites distigma

Hydrotrupes palpalis

Batrachomatus nannup

Aspidytes niobe

Hydaticus pacificus

Platynectes sp.

Caperhantus cicurius

Hyderodes shuckardi

Meridiorhantus calidus -20.3963050509039

-16.1705342699442

Calosoma frigidum

Dytiscus marginalis

-15.5200822594503

Agabus undulatus

29,617, 30,692

24,276, 24,277

20,324, 22,806

20,547, 20,295

23,265, 22,931

20,813, 22,094

Supermatrix D - recoded

Supermatrix F

Supermatrix G

Supermatrix H

Supermatrix I

Supermatrix J

2,000

2,000

2,000

2,000

2,000

2,000

2,000

Burn-in

5

5

5

5

5

5

5

Sampling frequency

0.29331

0.33291

0.16444

0.22626

0.30707

0.49131

1

Maxdiff**

bpcomp

67

629

50

73

119

532

90

loglik

166

150

519

332

123

415

37

length

89

1,250

71

171

371

2,227

221

alpha

1,115

621

623

702

747

1,142

1,430

Nmode

78

272

53

82

127

1,610

315

statent

150

202

183

124

63

740

33

statalpha

tracecomp (effsize***)

***Note: According to the manual of Phylobayes values > 50 are acceptable.

937

42

557

59

74

3,183

75

rrent

**Note: According to the manual of Phylobayes if maxdiff = 1 after 10000 generations the chains have likely stuck in a local optimum and values < 0.3 are acceptable. We considered the maxdiff value for supermatrix F as marginally acceptable since maxdiff value was estimated to 0.307.

*Note: For supermatrix D less cycles were run due to computational limitations.

10,030, 10,246

Supermatrix D*

No. of MCMC cycles (run1 , run2)

Table S8: Summarized convergence statistics of the Bayesian phylogenetic analyses performed with the software Phylobayes under the CAT+GTR+G4 model.

18,813

20,931

17,571

18,324

22,276

17,410

6,223

rrmean

JTT+C60+F+R8

JTT+C60+F+R8

Supermatrix C

Supermatrix D

JTTDCMUT+C60+F+R7

JTTDCMUT+C60+F+R6

JTT+C60+F+R5

JTT+C60+F+R7

JTTDCMUT+C60+F+R5

Supermatrix F

Supermatrix G

Supermatrix H

Supermatrix I

Supermatrix J

**Note: It was not analyzed with the maximum-likelihood approach.

*Note: Not analyzed.

JTTDCMUT+C60+F+R9

Supermatrix E

-

JTT+C60+F+R8

Supermatrix B

Supermatrix D – recoded**

-

Best model

Supermatrix A*

Dataset

LG+R9

LG+R10

LG+R10

LG+R10

LG+R10

LG+R10

-

LG+R10

LG+F+R10

LG+R10

-

Best sitehomogeneous model

843160.811

1447045.90

1081638.86

1263178.26

1667656.21

2706783.69

-

2661340.94

3365688.93

2699060.74

-

AICc for best model (AICc)

891371.387

1531968.58

1142790.64

1337483.23

1766780.96

2864913.19

-

2816918.86

3545261.41

2852218.04

-

AICc for best sitehomogeneous model

Table S9: Summarized results of the model selection procedure in ModelFinder for all partitioned and unpartitioned supermatrices (IQ-TREE v. 1.6.12).

-48210.58

-84922.68

-61151.77

-74304.97

-99124.75

-158129.50

-

-155577.92

-179572.48

-153157.30

-

ΔAICcAICc

2854344.989

3549562.162

Supermatrix B

Supermatrix C

AICc

Site-homogeneous (SHOMU)

3552120.506

2856869.834

BIC

3369952.821

2701618.882

AICc

Site-heterogeneous (SHETU)

3373178.732

2704802.417

BIC

3513279.535

2821151.198

AICc

Partitioned site-homogeneous (SHOMP)

Table S10: Comparison of different models (and partitioning schemes) for the analyzed partitioned matrices (see Table 1). Comparisons were based on a fixed neighbour-joining tree constructed with MEGA X (JTT + uniform rates).

BIC

3531601.641

2838600.738

136

136

136

136

Supermatrix B_nt

Supermatrix C_nt

Supermatrix D_nt

56,812

54,175

100,900

127,584

No. of No. of nucleotide species alignment sites

Supermatrix A_nt

Nucleotide dataset

17,630

17,282

37,788

48,680

31.0%

31.9%

37.5%

38.2%

0.789

0.868

0.814

0.788

0.037

0.054

0.041

0.035

Percentage of Overall Minimum Parsimony parsimony alignment completeness informative informative completenes score for pairs of sites sites s score (Ca) sequences (Cr ij)

Statistics

0.061

0.074

0.090

0.091

4.19E-02

4.70E-10

7.39E-16

1.51E-17

Average pMedian distance pairwise pbetween value to the sequences Bowker’s test

Table S11: Summarized statistics of the analyzed nucleotide supermatrices and information on what analyses were performed for each supermatrix.

3.96E-02

1.96E-08

52.02%

88.98%

Partitioned (15 tree searches, partitioned)

-

-

-

Unpartitioned under the GTR+FO*H4 model (15 tree searches)

Maximum likelihood analyses with the GHOST model (i.e., accounting for heterotachy)

Corresponding supermatrix A after keeping only second codon positions and trimming each partition with BMGE (h=0.5). Only partitions with >= 4 52.21% species, >= 80 sites, =150 nucleotide sites and > 6.60E-13 93.85% 92.56% 30% missing data from supermatrix A_nt

126.6

130.3

156.8

165.4

155.2

169.0

170.9

175.5

All genes (n = 348)

LM subset (n = 174)

SH subset (n = 174)

PI subset (n = 175)

LM+SH subset (n = 104)

LM+PI subset (n = 87)

SH+PI subset (n = 130)

LM+PI+SH (n = 74)

Set of genes

Average no. of parsimony informative sites per gene

158.5

157.0

148.0

142.5

150.0

144.5

118.5

119.0

Median no. of parsimony informative sites per gene

245.0

239.0

244.0

226.0

242.0

226.0

209.0

214.0

261.7

253.8

259.6

239.3

260.6

240.7

225.3

231.3

Median Mean length of length of genes genes

643

643

643

643

643

643

643

643

Max. length of genes

154

150

154

150

150

150

150

150

0.168

0.219

0.170

0.161

0.227

0.205

0.154

0.206

Median Min. proportion length of of missing genes data

0.176

0.213

0.176

0.162

0.224

0.200

0.159

0.204

Mean proportion of missing data

0.300

0.375

0.300

0.300

0.386

0.375

0.300

0.386

Max. proportion of missing data

0.073

0.073

0.073

0.050

0.073

0.050

0.044

0.044

Min. proportion of missing data

111.0

87.5

117.0

116.5

111.0

104.0

123.0

122.0

101.3

92.1

104.2

105.8

99.4

96.1

112.8

109.1

133

134

134

134

136

134

135

136

41

41

41

41

41

41

41

41

0.114

0.114

0.114

0.117

0.119

0.114

0.117

0.122

Median no. of Mean no. of Max. no. of Min. no. of Median sequences sequences sequences sequences RCFV

Table S12: Summarized statistics for the different groups (i.e., sets) of genes that were analyzed with the summary coalescent phylogenetic method. RCFV values were calculated with BaCoCa v. 1.105 (see materials and methods).

0.114

0.116

0.114

0.117

0.123

0.116

0.117

0.124

Mean RCFV

0.194

0.230

0.194

0.194

0.324

0.230

0.210

0.324

Max. RCFV

0.039

0.039

0.039

0.039

0.039

0.039

0.039

0.039

Min. RCFV

File S2

Supplementary experimental procedures Bait design We used the previously inferred ortholog set of Vasilikopoulos et al. (2019) and 24 transcriptomes of Adephaga (File S1: Table S1) to generate codon-based nucleotide sequence alignments of the genes in the ortholog set (see Misof et al., 2014 for details on generating the codon-based nucleotide alignments). The sequences of the reference species of the ortholog set (Harpegnathos saltator, Nasonia vitripennis, Bombyx mori, Danaus plexippus, Anopheles gambiae) were removed before generating the codon-based nucleotide sequence alignments except for the sequences of Tribolium castaneum. Subsequently we screened these alignments for regions suitable for bait design. The sequences of T. castaneum (already part of the reference taxon set, Vasilikopoulos et al., 2019) were kept in the alignments to allow BaitFisher to cut these alignments according to CDS features (i.e., coding exons, Mayer et al., 2016). We used the genome assembly of T. castaneum v. 5.2 (Herndon et al., 2020; Richards et al., 2008) (scaffolds downloaded from BeetleBase, last access 28 October 2019, Kim et al., 2010) and the same version of gene annotation (GFF file downloaded from iBeetle-Base, Dönitz et al. 2014, last access: October 28th 2019) to identify CDS boundaries within each the codon-based nucleotide sequence alignment (Mayer et al., 2016). The sequences of T. castaneum were automatically excluded by Baitfisher before inferring the DNA sequences of the baits. The required taxonomic group string was specified as follows in all tiling design

experiments:

(Clinidium_baldufi), (Gyrinus_marinus),

(Aspidytes_niobe),

(Sinaspidytes_wrasei),

(Cybister_lateralimarginalis), (Haliplus_fluviatilis),

(Dineutus_sp), (Hygrobia_hermanni,

(Cicindela_hybrida), (Elaphrus_aureus), Hygrobia_nigra),

(Noterus_clavicornis), (Thermonectus_intermedius), (Carabus_granulatus, Calosoma_frigidum), (Pogonus_chalceus),

(Amphizoa_lecontei,

Amphizoa_insolens),

(Liopterus_haemorrhoidalis),

(Batrachomatus_nannup), (Metrius_contractus), (Bembidion_corgenoma), (Chlaenius_tricolor), (Trachypachus_gibbsii). The cluster threshold was set to 0.15. The rest of the options were

1

specified according to the parameters in File S1: Table S3 separately for each tiling design experiment.

Removal of baits with multiple hits to a reference genome (Bembidion corgenoma, Gustafson et al., 2019) was performed with BaitFilter v. 1.0.5 (Mayer et al., 2016) using the following options: -blast-min-hit-coverage-of-baits-in-tiling-stack 0.80 --blast-first-hit-evalue 0.0000001 --blast-secondhit-evalue 0.00001 and by using BLAST+ v. 2.6.0 (Camacho et al., 2009). This filtering was performed separately for the set of baits that resulted in each tiling design experiment. Subsequently, the best bait region per protein-coding exon in each tiling design experiment was kept (option in Baitfilter: --mode fb). For those CDS regions that were captured in multiple tilingdesign experiments only the longest bait regions among experiments were considered. This was accomplished by adding the bait regions from the different experiments (non-redundantly for coding exons, starting from results of experiments that allowed longer regions and adding regions from experiments that allowed shorter bait regions) to a combined file with the baits until the maximum size of ~5.99Mbp of DNA was reached (i.e., max. size of bait sequences for the DNA target enrichment kit that was used: SureSelectXT2 Target Enrichment System, Agilent Technologies Inc., Santa Clara, U.S.A.). The last task was performed with a custom Perl script.

Hybrid enrichment of target genomic DNA sequences For enriching the target gDNA in the indexed libraries, we followed the procedure for capture library size >3.0 Mb outlined in Agilent's SureSelectXT2 Target Enrichment System Protocol for Illumina Paired-End Multiplexed Sequencing (Version E1 published in June 2015, pages 55–74), with minor modifications (see Bank et al., 2017). We used a SureSelectXT2 Custom 5.99 Mbp library of 49,786 120bp-long baits and pooled the indexed libraries (8 samples per pool) before the hybridization reaction. After pooling the libraries, the total volume of the pools was reduced to 3.5 μl with a SpeedVac R SPD 111V (ThermoFisher Scientific, Waltham, MA, USA). Hybridization with the baits was allowed for 48 h at 65 °C in a GeneAmp PCR System 2720. We then performed the physical separation of the target DNA from the remaining DNA by adding 50 μl Dynabeads MyOne

2

Streptavidin T1 beads and by incubating the mixture for 30 min. at room temperature (Bank et al., 2017). After washing of the beads, the captured DNA was re-suspended in 30 μl nuclease-free water and post-amplified in an on-bead PCR reaction (Bank et al., 2017). For post-amplification, we followed Agilent's protocol by applying the PCR cycling program for a capture library size of > 1.5 Mb with a slightly increased cycle number as described by Bank et al. (2017). We purified the amplified target DNA with AMPure XP beads in a ratio of 1:0.75 to remove oligonucleotide primer dimers and to further select fragments with a length between 200 and 500 bp (Bank et al., 2017). Each of the twelve library pools was eluted in 40 μl nuclease-free water and checked for quality and quantity with a Fragment Analyzer and a Quantus Fluorometer.

Quality trimming of raw genomic reads and assembly of the target-DNA enrichment data Raw reads and bases of poor quality as well as Illumina adapter sequences were removed with Trimmomatic v. 0.38 (Bolger et al., 2014) using the following options: ILLUMINACLIP:TruSeq3PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:25 (note: the core Illumina adapter sequence is the same in the TruSeq and NEBNext library preparation kit). Genomic assemblies were generated with a customized compilation of IDBA-UD v. 1.1.3 (Peng et al., 2012) as described by Mayer et al. (2016) (in order for the software to be able to deal with read lenghts of 150 bp) and using the options: --step 5 --maxk 120.

Sequencing, assembly and cleaning of new transcriptomes and exploitation of previously published transcriptomes We included 38 transcriptomes in our combined dataset for inferring the phylogeny of Adephaga. 36 of these transcriptomes have been used in other phylogenetic studies (File S1: Table S1) (Boussau et al., 2014; McKenna et al., 2019; Misof et al., 2014; Pauli et al., 2016; Peters et al., 2014; Pflug et al., 2020; Seppey et al., 2019; Van Belleghem et al., 2012; Vasilikopoulos et al., 2020, 2019). The transcriptomes of Chlaenius tricolor and B. corgenoma were included here for the first time a phylogenetic analysis although the transcriptome of B. corgenoma has already been

3

published in another study (Pflug et al., 2020; as Bembidion sp. nr. transversale, see File S1: Table S1).

Libraries for C. tricolor were prepared at Oregon State University. In short, mRNA was isolated using NEBNext Poly(A) mRNA Magnetic Isolation Module (New England Biolabs, Ipswich, MA, U.S.A.), and libraries were constructed with NEBNext Ultra RNA Library Prep Kit for Illumina (New England Biolabs). The fragment size distribution of each library was characterized with a 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, U.S.A.) using the High Sensitivity DNA Analysis Kit and 1μl of sample. These libraries were then sequenced on an Illumina HiSeq 2000 at Oregon State University.

Transcriptome reads for the transcriptome of C. tricolor were trimmed with Trimmomatic and assembled with Trinity (v. r20140413) (Grabherr et al., 2011), both implemented within the Agalma v. 0.5 pipeline using default parameters for all parts of the pipeline (Dunn et al., 2013).

Cross-contamination checks and vector contamination screening for transcriptomes Most analyzed transcriptomes have been previously checked for cross-species contaminations and vector contaminations (see File S1: Table S1). Cross-contamination checks were only performed here for the transcriptomes of C. tricolor, Trachypachus gibbsii, B. corgenoma and Amphizoa insolens because these transcriptomes have been initially processed in the same laboratory at Oregon State University. The cross-contamination check was performed among the transcriptomes of these four species and also included other transcriptomes processed in the same laboratory (not included in our analyses) using Croco v. 1.1 (Simion et al., 2018) with the following options: --tool: Kallisto, --fold-threshold: 2, --trim5: 0, --trim3: 0, --minimum-coverage: 0.2, --suspect_id: 99, -suspect_len: 200, --overexpression: 300. The filtered transcriptome assemblies were subsequently screened for vector contaminations using the UniVec v. 10.0 database as described by Misof et al. (2014). The raw sequenced reads of the transcriptome of C. tricolor have been deposited at the NCBI-SRA database (File S1: Table S1).

4

Orthology assignment for target DNA-enrichment data and transcriptomes Because Orthograph is designed to process transcriptomic data (Petersen et al., 2017) we changed the default exonerate model for orthograph-reporter to “protein2dna” for processing of the genomic libraries (option: --exonerate-alignment-model). The protein2dna model aligns the query protein sequence to a DNA sequence, incorporating all the appropriate gaps and frameshifts but without including modelling of introns (Slater & Birney, 2005). The orthology assignment for the transcriptomes was performed with the default exonerate model (i.e. “protein2genome”). The rest of the options were identical for all datasets: max-blast-searches = 50, blast-max-hits = 50, orfoverlap-minimum = 0.5, extend-orf = 1, minimum-transcript-length = 30, substitute-u-with = X and otherwise default options.

Manual inspection and curation of amino-acid alignments When manually inspecting the amino-acid alignments, we observed in a few instances that the hybrid-capture data of very few species would share the same few amino-acid residues that were not observed in the same column when looking at sequences of transcriptome or genome reference data. When this phenomenon was observed at the borders of the captured regions and all transcriptomes or genome data were homogeneous in terms of amino-acid residues for that column we considered that these residues could represent sequencing errors, frameshift errors or intronic residues (see Bank et al., 2017) and we masked them manually with an X. We acknowledge that this might have masked a few true amino-acid residues at the borders of the captured regions but we suggest that it is better to follow a conservative approach intended to remove erroneous sequence data rather than to include data with potentially erroneous phylogenetic signal.

5

Outlier sequence removal based on pairwise BLOSUM62 distances using a sliding window approach in the multiple sequence alignments of individual genes We screened the amino-acid multiple sequence alignments for outlier sequences before removing randomly similar sections with ALISCORE (Fig. 1). Outlier identification and removal was performed with the same procedure described by Dietz et al. (2019) (The OliInSeq program is available upon request from C. Mayer, [email protected]). Sequences identified as outliers in 25% or more of the sequence windows, were removed completely from the multiple sequence alignment. The window size for the pairwise comparisons was adjusted to 20 amino acids. Corresponding outlier sequences were subsequently also removed from the codon-based nucleotide sequence data.

Controlling for among-species compositional heterogeneity, removal of distantly related outgroup species and removal of ingroup species with long branches In order to reduce the sensitivity of our phylogenetic analyses to compositional heterogeneity among species, we generated and analyzed a Dayhoff6-recoded version of supermatrix D. As an alternative approach to reduce among-species compositional heterogeneity in the data, another independent supermatrix was generated for the same purpose by keeping only the 50% of genes with the lowest degree of among-species compositional heterogeneity (RCFV values calculated with BaCoCa v. 1.105, Kück & Struck, 2014). The 322 compositionally homogeneous (i.e., with reduced RCFV values) genes were then concatenated into a new supermatrix which was subsequently trimmed with BMGE (h = 0.5, BLOSUM62) to remove hypervariable sites (supermatrix I, Table 1 of the main text).

We tested whether removal of distantly related outgroup species affected the phylogenetic relationships as has been previously suggested for other taxonomic groups (Philippe et al., 2009; Pisani et al., 2015). For this reason, we generated one additional matrix by removing distantly related outgroup species from supermatrix F (Table 1). Species-specific LB scores were calculated with TreSpEx v. 1.1 (Struck, 2014). Calculation of LB scores for each species was performed

6

using the inferred tree of supermatrix F under the site-heterogeneous model (SHETU). Subsequently, all outgroup species were removed except for the two species of Archostemata (16 removed species). From the putative closest outgroup clade of Adephaga that includes the suborders Archostemata + Myxophaga (McKenna et al., 2019) we kept the two species of Archostemata because they had lower LB scores than the two species of Myxophaga. This filtering resulted in supermatrix G (n = 120 species).

In a second step, we tested whether removal of long-branched ingroup species affected phylogenetic reconstructions by removing the 20 ingroup species with the highest long-branch scores from supermatrix G (LB scores). Specifically, we repeated the calculation of LB scores based on the tree inferred under the SHETU model for supermatrix G and removed the 20 ingroup species with the highest LB scores (File S1: Table S7). This removal resulted in supermatrix H (n = 100 species).

Calculation of supermatrix statistics We inferred substitution saturation plots for most analyzed amino-acid supermatrices. Pairwise patristic and p-distances were calculated with TreSpEx v. 1.1 (Struck, 2014) by providing the best maximum-likelihood (ML) trees and their corresponding amino-acid supermatrices. Substitution saturation plots were then inferred in R v. 3.6.3 (R Core Team, 2020).

Model selection and phylogenetic inference in concatenation-based analyses Phylogenetic tree reconstructions and model selection analyses were conducted with IQ-TREE v. 1.6.12 (Nguyen et al., 2015). Model selection in all supermatrices (unpartitioned, B–J) was performed by also testing the relative fit of empirical profile site-heterogeneous mixture models (option: -mfreq FU, F, C20+F, C40+F C60+F). During model selection, these amino-acid frequency profiles were combined with the exchange rates of the most commonly used single-matrix aminoacid models: LG (Le & Gascuel, 2008), WAG (Whelan & Goldman, 2001), JTT (Jones et al., 1992), JTTDCMUT, and DCMUT (Kosiol & Goldman, 2005). Additionally, the model selection procedure

7

on the unpartitioned matrices (B–J) involved all possible combinations for modelling among-site rate heterogeneity in the data (options: -mrate E,I,G,I+G,R -cmin 4) (Kalyaanamoorthy et al., 2017). Lastly, we also included the LG4X and LG4M mixture models in our model selection procedure (Le et al., 2012). For the partitioned supermatrices (B, C)

an optimal partitioning

scheme with site-homogeneous models was inferred using the rcluster algorithm (Lanfear et al., 2014) in IQ-TREE 1.6.12 (options: -m MFP+MERGE -rcluster 10 -rcluster-max 5000 -merit AICc mset LG, WAG, JTT, JTTDCMUT, DCMUT -madd LG4X, LG4M -mrate E, I, G, I+G, R). We inferred 15 maximum-likelihood trees under the site-homogeneous partitioned models and 15 trees under the unpartitioned site-homogeneous models (the site-homogeneous models models that showed the best fit to the datasets with the exclusion of site-heterogeneous models). Only one tree search for each supermatrix was conducted under the better-fitting site-heterogeneous models due to computational limitations. Lastly, we performed 15 independent trees with the approximation to the site-heterogeneous models (PMSF) (Wang et al., 2018), by using the best tree that resulted from the analyses under the unpartitioned site-homogeneous model as a guide tree. All maximumlikelihood tree searches were performed using random starting trees (option: -t RANDOM) and the best tree in each analysis (among the 15 inferred trees wherever applicable) was selected based on the log-likelihood scores. The relative model fit of partitioned, unpartitioned site-homogeneous and unpartitioned site-heterogeneous models was performed based on a fixed neighbor-joining tree computed with MEGA X v. 10.0.5 (Kumar et al., 2018) and using the JTT model with uniform rates. Comparison of partitioned with unpartitioned models was then performed in IQ-TREE (same version) based on this fixed neighbor-joining tree using both the AICc and BIC criteria (File S1: Table S10).

Optimization of the partitioning scheme and model selection for the nucleotide sequence data (option: -st DNA) was performed in IQ-TREE v. 1.6.12 with the following options: -m MFP+MERGE -mset GTR, K2P, F81, TN, JC, HKY -rcluster-max 5000 -rcluster 10 -mrate E, I, G, I+G, R. The unpartitioned nucleotide sequence datasets were analyzed using the same combinations of models as above. We performed 15 independent maximum-likelihood tree searches under the

8

best-fitting models in all cases and calculated branch support based on 2000 ultrafast bootstraps (Hoang et al., 2018). For supermatrix nt_B we also performed 15 independent tree searches using the GTR+FO*H4 model (unpartitioned) that accounts for heterotachy among sequences (Crotty et al., 2020). Deviation of nucleotide supermatrices from stationary reversible and homogeneous conditions as well as inference of completeness scores were performed using the same statistical criteria and software that was used to analyze the amino-acid supermatrices (File S1: Table S11, see main text).

Single gene-tree phylogenetic inference and likelihood mapping analyses for individual genes For inferring the gene trees we first selected the best-fit substitution models for each of the 348 genes in IQ-TREE 1.6.12 based on the AICc criterion using the same set of site-homogeneous models as above (-merit AICc, cmax, cmin: default) and inferred 10 gene trees per gene using the random starting tree option (-t RANDOM). We then selected the gene trees with the best loglikelihood score for downstream analyses. Branch support for each gene tree was estimated based on 10000 SH-aLRT replicates (Guindon et al., 2010), because this measure was previously shown to outperform other branch support measures for identifying dubious clades in single gene-tree analyses (Simmons & Kessenich, 2020). Branches with support values lower than 50% were collapsed before performing summary coalescent analyses for all subsets of gene-trees (see main text). For measuring the phylogenetic informativeness of individual genes based on likelihood mapping we performed likelihood-mapping analyses with the same version of IQ-TREE and the best fitted models. We considered all possible quartets for each gene (options -lmap ALL -wql) and the proportion of fully resolved quartets was calculated with a custom Python script.

References Bank, S., Sann, M., Mayer, C. et al. (2017) Transcriptome and target DNA enrichment sequence data provide new insights into the phylogeny of vespid wasps (Hymenoptera: Aculeata: Vespidae). Molecular Phylogenetics and Evolution, 116, 213–226.

9

Bolger, A.M., Lohse, M. & Usadel, B. (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics, 30, 2114–2120. Boussau, B., Walton, Z., Delgado, J.A. et al. (2014) Strepsiptera, phylogenomics and the long branch attraction problem. PLoS One, 9, e107709. Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K. & Madden, T.L., (2009) BLAST+: architecture and applications. BMC Bioinformatics, 10, 421. Crotty, S.M., Minh, B.Q., Bean, N.G., Holland, B.R., Tuke, J., Jermiin, L.S. & von Haeseler, A., (2020) GHOST: recovering historical signal from heterotachously evolved sequence alignments. Systematic Biology, 69, 249–264. Dietz, L., Dömel, J.S., Leese, F., Mahon, A.R. & Mayer, C. (2019) Phylogenomics of the longitarsal Colossendeidae: the evolutionary history of an Antarctic sea spider radiation. Molecular Phylogenetics and Evolution, 136, 206–214. Dönitz, J., Schmitt-Engel, C., Grossmann, D. et al. (2014) iBeetle-Base: a database for RNAi phenotypes in the red flour beetle Tribolium castaneum. Nucleic Acids Research, 43, D720– D725. Dunn, C.W., Howison, M. & Zapata, F. (2013) Agalma: an automated phylogenomics workflow. BMC Bioinformatics, 14, 330. Grabherr, M.G., Haas, B.J., Yassour, M. et al. (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nature Biotechnology, 29, 644–652. Guindon, S., Dufayard, J.F., Lefort, V., Anisimova, M., Hordijk, W. & Gascuel, O. (2010) New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Systematic Biology, 59, 307–321. Gustafson, G.T., Alexander, A., Sproul, J.S., Pflug, J.M., Maddison, D.R. & Short, A.E.Z. (2019) Ultraconserved element (UCE) probe set design: base genome and initial design parameters critical for optimization. Ecology and Evolution, 9, 6933–6948.

10

Herndon, N., Shelton, J., Gerischer, L. et al. (2020) Enhanced genome assembly and a new official gene set for Tribolium castaneum. BMC Genomics, 21, 47. Hoang, D.T., Chernomor, O., von Haeseler, A., Minh, B.Q. & Le, S.V. (2018) UFBoot2: improving the ultrafast bootstrap approximation. Molecular Biology and Evolution, 35, 518–522. Jones, D.T., Taylor, W.R. & Thornton, J.M. (1992) The rapid generation of mutation data matrices from protein sequences. Computer Applications in the Biosciences, 8, 275–282. Kalyaanamoorthy, S., Minh, B.Q., Wong, T.K.F., von Haeseler, A. & Jermiin, L.S. (2017) ModelFinder: fast model selection for accurate phylogenetic estimates. Nature Methods, 14, 587–589. Kim, H.S., Murphy, T., Xia, J. et al. (2010) BeetleBase in 2010: revisions to provide comprehensive genomic information for Tribolium castaneum. Nucleic Acids Research, 38, D437–D442. Kosiol, C. & Goldman, N. (2005) Different versions of the dayhoff rate matrix. Molecular Biology and Evolution, 22, 193–199. Kück, P. & Struck, T.H. (2014) BaCoCa - a heuristic software tool for the parallel assessment of sequence biases in hundreds of gene and taxon partitions. Molecular Phylogenetics and Evolution, 70, 94–98. Kumar, S., Stecher, G., Li, M., Knyaz, C. & Tamura, K. (2018) MEGA X: molecular evolutionary genetics analysis across computing platforms. Molecular Biology and Evolution, 35, 1547– 1549. Lanfear, R., Calcott, B., Kainer, D., Mayer, C. & Stamatakis, A. (2014) Selecting optimal partitioning schemes for phylogenomic datasets. BMC Evolutionary Biology, 14, 82. Le, S.Q., Dang, C.C. & Gascuel, O. (2012) Modeling protein evolution with several amino acid replacement matrices depending on site rates. Molecular Biology and Evolution, 29, 2921– 2936.

11

Le, S.Q. & Gascuel, O. (2008) An improved general amino acid replacement matrix. Molecular Biology and Evolution, 25, 1307–1320. Mayer, C., Sann, M., Donath, A. et al. (2016) BaitFisher: a software package for multispecies target DNA enrichment probe design. Molecular Biology and Evolution, 33, 1875–1886. McKenna, D.D., Shin, S., Ahrens, D. et al. (2019) The evolution and genomic basis of beetle diversity. Proceedings of the National Academy of Sciences, 116, 24729–24737. Misof, B., Liu, S., Meusemann, K. et al. (2014) Phylogenomics resolves the timing and pattern of insect evolution. Science, 346, 763–767. Nguyen, L.-T., Schmidt, H.A., Von Haeseler, A. & Minh, B.Q. (2015) IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Molecular Biology and Evolution, 32, 268–274. Pauli, T., Vedder, L., Dowling, D. et al. (2016) Transcriptomic data from panarthropods shed new light on the evolution of insulator binding proteins in insects. BMC Genomics, 17, 861. Peng, Y., Leung, H.C.M., Yiu, S.M. & Chin, F.Y.L. (2012) IDBA-UD: a de novo assembler for singlecell and metagenomic sequencing data with highly uneven depth. Bioinformatics, 28, 1420– 1428. Peters, R.S., Meusemann, K., Petersen, M. et al. (2014) The evolutionary history of holometabolous insects inferred from transcriptome-based phylogeny and comprehensive morphological data. BMC Evolutionary Biology, 14, 52. Petersen, M., Meusemann, K., Donath, A. et al. (2017) Orthograph: a versatile tool for mapping coding nucleotide sequences to clusters of orthologous genes. BMC Bioinformatics, 18, 111. Philippe, H., Derelle, R., Lopez, P. et al. (2009) Phylogenomics revives traditional views on deep animal relationships. Current Biology, 19, 706–712.

12

Pflug, J.M., Holmes, V.R., Burrus, C., Johnston, J.S. & Maddison, D.R. (2020) Measuring genome sizes using read-depth, k-mers, and flow cytometry: methodological comparisons in beetles (Coleoptera). G3 Genes|Genomes|Genetics, 10, 3047–3060. Pisani, D., Pett, W., Dohrmann, M. et al. (2015) Genomic data do not support comb jellies as the sister group to all other animals. Proceedings of the National Academy of Sciences, 112, 15402–15407. R Core Team (2020) R: A Language and Environment for Statistical Computing. Richards, S., Gibbs, R.A., Weinstock, G.M. et al. (2008) The genome of the model beetle and pest Tribolium castaneum. Nature, 452, 949–55. Seppey, M., Ioannidis, P., Emerson, B.C. et al. (2019) Genomic signatures accompanying the dietary shift to phytophagy in polyphagan beetles. Genome Biology, 20, 98. Simion, P., Belkhir, K., François, C. et al. (2018) A software tool “CroCo” detects pervasive crossspecies contamination in next generation sequencing data. BMC Biology, 16, 28. Simmons, M.P. & Kessenich, J. (2020) Divergence and support among slightly suboptimal likelihood gene trees. Cladistics, 36, 322–340. Slater, G.S.C. & Birney, E. (2005) Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics, 6, 31. Struck, T.H. (2014) TreSpEx-detection of misleading signal in phylogenetic reconstructions based on tree information. Evolutionary Bioinformatics, 10, 51–67. Van Belleghem, S.M., Roelofs, D., Van Houdt, J. & Hendrickx, F. (2012) De novo transcriptome assembly and SNP discovery in the wing polymorphic salt marsh beetle Pogonus chalceus (Coleoptera, Carabidae). PLoS One, 7, e42605. Vasilikopoulos, A., Balke, M., Beutel, R.G. et al. (2019) Phylogenomics of the superfamily Dytiscoidea (Coleoptera: Adephaga) with an evaluation of phylogenetic conflict and systematic error. Molecular Phylogenetics and Evolution, 135, 270–285.

13

Vasilikopoulos, A., Misof, B., Meusemann, K. et al. (2020) An integrative phylogenomic approach to elucidate the evolutionary history and divergence times of Neuropterida (Insecta: Holometabola). BMC Evolutionary Biology, 20, 64. Wang, H.-C., Minh, B.Q., Susko, E. & Roger, A.J. (2018) Modeling site heterogeneity with posterior mean site frequency profiles accelerates accurate phylogenomic estimation. Systematic Biology, 67, 216–235. Whelan, S. & Goldman, N. (2001) A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Molecular Biology and Evolution, 18, 691–699.

14

Trim with BMGE (BLOSUM62, h=0.4)

Supermatrix J

Trim with BMGE (BLOSUM62, h=0.3)

Supermatrix A

Supermatrix D - recoded

Recode states with Dayhoff-6 strategy

Supermatrix D

Remove partitions that failed symmetry tests (genes violating model assumptions) with IQ-TREE and trim with BMGE (BLOSUM62, h=0.5)

Recoded supermatrix

Partition boundaries, full taxon sampling

No partition boundaries, reduced taxon sampling

No partition boundaries, full taxon sampling

Starting supermatrix

Filtering of COGs and of initial supermatrix

Supermatrix I

Keep only half of the genes with the lowest RCFV value. Trim resulting matrix with BMGE (BLOSUM62, h=0.5)

Supermatrix B

Trimmed gene partitions with BMGE (BLOSUM62, h =0.4). Keep genes with length >= 50 sites

Supermatrix C

Fig. S1: Flowchart summarizing the steps for generating the different amino-acid supermatrices after processing of the individual COGs (see also Fig. 1 of the main text).

Supermatrix H

Remove long-branched ingroup species from supermatrix G (TreSpex)

Supermatrix G

Remove distantly related outgroups

Supermatrix F

Supermatrix E

Trim with BMGE (BLOSUM62, h=0.5)

- Remove outlier sequences - Trim COGs with ALISCORE - Remove partitions with zero information content (MARE)

Trimmed gene partitions with BMGE (BLOSUM62, h =0.5). Keep genes with >= 80 sites and