To reconcile nomenclature for the phylogeny, we compiled a taxonomic database for Anthophila based on Orr et al. (2021). We modified this database to reflect revised generic-level classification from more recently published taxonomic references that have provided genera stability.
¸The taxonomic database recognises seven bee families and 28 subfamilies, and provides a list of genera, species, and notes on nomenclature decisions and the taxonomic research that supports them. It also includes links to the references regarding recent changes in nomenclature. Because taxonomy and nomenclature are constantly changing, especially for taxa where relationships are yet to be resolved, the goal of this taxonomic database is to make it easier for researchers to track nomenclature changes in the phylogeny and easily change them if necessary.
The phylogenetic approach was based on four key components: i) multi-gene sequence data downloaded from global sequence databases (NCBI and BOLD); ii) a family-level framework using a phylogenomic dataset; iii) published ultra-conserved element (UCE) datasets combined and condensed; and iv) nomenclature reconciled to our taxonomic database.
Gene fragment accessions provided the species (tips), the phylogenomic dataset provided a robust backbone consistent with phylogenomic information on tree shape (topology, relative branch length and outgroup rooting), and the UCE leveraged new and powerful phylogenomic data that is increasingly being used to enforce best-estimate tree shape.
We focused on curating and analysing widely sampled loci used in previously published phylogenies to maximize overlap of data among lineages and reduce supermatrix sparseness (Table 1). This also facilitates the addition of sequence data to the alignment as new data become available.
Gene |
Sites | Genera | Species |
Nuclear | |||
Phylogenomic 'stub' | 21,546 | 135 | 200 |
UCE ‘stub’ | 13,250 | 183 | 678 |
ArgK | 546 | 62 | 611 |
CAD | 450 | 138 | 677 |
EF-1α | 1107 | 332 | 2049 |
LW Rh | 642 | 318 | 1,846 |
NaK | 1461 | 233 | 790 |
Pol II | 840 | 198 | 853 |
Wnt-1 | 456 | 257 | 841 |
28S rDNA | 1,440 | 339 | 1,253 |
Mitochondrial | |||
16S rDNA | 522 | 74 | 508 |
COI | 1,473 | 330 | 3,839 |
Cytb | 1,047 | 67 | 487 |
Totals | 44,780 | 2,666 | 14,623 |
Our general approach can be simplified under the following steps:
Download DNA sequences from the NCBI and BOLD. We also used a curated subset of data from five UCE data-matrices (Bossert et al. 2021a, 2021b; Freitas et al. 2021; Pisanty et al. 2022; Sless et al. 2022), and a backbone at the family-level, from the phylogenomic work of Almeida et al. (2023).
Reconcile species names to the taxonomic database and implement a series of quality checking steps, including gene dataset trees to check for aberrant samples.
Produce a concatenated supermatrix using the best sequence for all species (see phylogenetic coverage); compact the supermatrix to remove regions with minimal data or ambiguous alignment; and infer a phylogeny using IQTree (Nguyen et al., 2015).
Convert to dated chronogram using treePL (Smith and O’Meara, 2012), with a bee root calibrated to a broad normal distribution of 120 million years ago (SD 6).
For expanded methods see: https://doi.org/10.1016/j.ympev.2023.107963.