BeeTree

Methods

Taxonomic database

To reconcile nomenclature for the phylogeny, we compiled a taxonomic database for Anthophila based on Orr et al. (2021). We modified this database to reflect revised generic-level classification from more recently published taxonomic references that have provided genera stability.

¸

The taxonomic database recognises seven bee families and 28 subfamilies, and provides a list of genera, species, and notes on nomenclature decisions and the taxonomic research that supports them. It also includes links to the references regarding recent changes in nomenclature. Because taxonomy and nomenclature are constantly changing, especially for taxa where relationships are yet to be resolved, the goal of this taxonomic database is to make it easier for researchers to track nomenclature changes in the phylogeny and easily change them if necessary.


Phylogenetic approach

The phylogenetic approach was based on four key components: i) multi-gene sequence data downloaded from global sequence databases (NCBI and BOLD); ii) a family-level framework using a phylogenomic dataset; iii) published ultra-conserved element (UCE) datasets combined and condensed; and iv) nomenclature reconciled to our taxonomic database.

Gene fragment accessions provided the species (tips), the phylogenomic dataset provided a robust backbone consistent with phylogenomic information on tree shape (topology, relative branch length and outgroup rooting), and the UCE leveraged new and powerful phylogenomic data that is increasingly being used to enforce best-estimate tree shape.

We focused on curating and analysing widely sampled loci used in previously published phylogenies to maximize overlap of data among lineages and reduce supermatrix sparseness (Table 1). This also facilitates the addition of sequence data to the alignment as new data become available.


Sampling summary across genetic data

Gene* Sites Genera Species
Nuclear
Phylogenomic 'stub' 21,546 135 200
UCE ‘stub’ 13,250 183 678
ArgK 546 62 611
CAD 450 138 677
EF-1α 1107 332 2049
LW Rh 642 318 1,846
NaK 1461 233 790
Pol II 840 198 853
Wnt-1 456 257 841
28S rDNA 1,440 339 1,253
Mitochondrial
16S rDNA 522 74 508
COI 1,473 330 3,839
Cytb 1,047 67 487
Totals 44,780 2,666 14,623
*For our purposes, ‘gene’ refers to a discrete genetic data unit.


Our general approach can be simplified under the following steps:

  1. Download DNA sequences from the NCBI and BOLD. We also used a curated subset of data from five UCE data-matrices (Bossert et al. 2021a, 2021b; Freitas et al. 2021; Pisanty et al. 2022; Sless et al. 2022), and a backbone at the family-level, from the phylogenomic work of Almeida et al. (2023).

  2. Reconcile species names to the taxonomic database and implement a series of quality checking steps, including gene dataset trees to check for aberrant samples.

  3. Produce a concatenated supermatrix using the best sequence for all species (see phylogenetic coverage); compact the supermatrix to remove regions with minimal data or ambiguous alignment; and infer a phylogeny using IQTree (Nguyen et al., 2015).

  4. Convert to dated chronogram using treePL (Smith and O’Meara, 2012), with a bee root calibrated to a broad normal distribution of 120 million years ago (SD 6).

For expanded methods see: https://doi.org/10.1016/j.ympev.2023.107963.



References