Methods

Taxonomic database

To reconcile nomenclature for the phylogeny, we compiled a taxonomic database for Anthophila based on Orr et al. (2021). We modified this database to reflect revised generic-level classification from more recently published taxonomic references that have provided genera stability.

The taxonomic database recognises seven bee families and 28 subfamilies, and provides a list of genera, species, and notes on nomenclature decisions and the taxonomic research that supports them. It also includes links to the references regarding recent changes in nomenclature. Because taxonomy and nomenclature are constantly changing, especially for taxa where relationships are yet to be resolved, the goal of this taxonomic database is to make it easier for researchers to track nomenclature changes in the phylogeny and easily change them if necessary.

Phylogenetic approach

The phylogenetic approach was based on four key components: i) multi-gene sequence data downloaded from global sequence databases (NCBI and BOLD); ii) a family-level framework using a phylogenomic dataset; iii) published ultra-conserved element (UCE) datasets combined and condensed; and iv) nomenclature reconciled to our taxonomic database.

Gene fragment accessions provided the species (tips), the phylogenomic dataset provided a robust backbone consistent with phylogenomic information on tree shape (topology, relative branch length and outgroup rooting), and the UCE leveraged new and powerful phylogenomic data that is increasingly being used to enforce best-estimate tree shape.

We focused on curating and analysing widely sampled loci used in previously published phylogenies to maximize overlap of data among lineages and reduce supermatrix sparseness (Table 1). This also facilitates the addition of sequence data to the alignment as new data become available.

Sampling summary across genetic data

Gene*	Sites	Genera	Species
Nuclear
Phylogenomic 'stub'	21,546	135	200
UCE ‘stub’	13,250	183	678
ArgK	546	62	611
CAD	450	138	677
EF-1α	1107	332	2049
LW Rh	642	318	1,846
NaK	1461	233	790
Pol II	840	198	853
Wnt-1	456	257	841
28S rDNA	1,440	339	1,253
Mitochondrial
16S rDNA	522	74	508
COI	1,473	330	3,839
Cytb	1,047	67	487
Totals	44,780	2,666	14,623

*For our purposes, ‘gene’ refers to a discrete genetic data unit.

Our general approach can be simplified under the following steps:

Download DNA sequences from the NCBI and BOLD. We also used a curated subset of data from five UCE data-matrices (Bossert et al. 2021a, 2021b; Freitas et al. 2021; Pisanty et al. 2022; Sless et al. 2022), and a backbone at the family-level, from the phylogenomic work of Almeida et al. (2023).
Reconcile species names to the taxonomic database and implement a series of quality checking steps, including gene dataset trees to check for aberrant samples.
Produce a concatenated supermatrix using the best sequence for all species (see phylogenetic coverage); compact the supermatrix to remove regions with minimal data or ambiguous alignment; and infer a phylogeny using IQTree (Nguyen et al., 2015).
Convert to dated chronogram using treePL (Smith and O’Meara, 2012), with a bee root calibrated to a broad normal distribution of 120 million years ago (SD 6).

For expanded methods see: https://doi.org/10.1016/j.ympev.2023.107963.

References

Almeida, E.A.B., Bossert, S., Danforth, B.N., Kuhlmann, M., Branstetter, M.G., Pie, M.R., Almeida, E.A.B., Bossert, S., Danforth, B.N., Porto, D.S., Freitas, F. V., 2023. The evolutionary history of bees in time and space. Current Biology, 33(16), 3409-3422. doi:10.1016/j.cub.2023.07.005.
Bossert, S., Murray, E. A., Pauly, A., Chernyshov, K., Brady, S. G., Danforth, B. N., 2021. Gene tree estimation error with ultraconserved elements: an empirical study on Pseudapis bees. Systematic Biology, 70(4), 803-821. doi:10.1093/sysbio/syaa097.
Bossert, S., Wood, T.J., Patiny, S., Michez, D., Almeida, E.A.B., Minckley, R.L., Packer, L., Neff, J.L., Copeland, R.S., Straka, J., Pauly, A., Griswold, T., Brady, S.G., Danforth, B.N., Murray, E.A., 2022. Phylogeny, biogeography and diversification of the mining bee family Andrenidae. Systematic Biology, 47(2), 283–302. doi:10.1111/syen.12530.
Freitas, F.V., Branstetter, M.G., Griswold, T., Almeida, E.A.B., 2021. Partitioned Gene-Tree Analyses and Gene-Based Topology Testing Help Resolve Incongruence in a Phylogenomic Study of Host-Specialist Bees (Apidae: Eucerinae). Molecular Biology and Evolution, 38(3), 1090–1100. doi:10.1093/molbev/msaa277.
Nguyen, L.T., Schmidt, H.A., Von Haeseler, A., Minh, B.Q., 2015. IQ-TREE: A fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Molecular Biology and Evolution, 32(1), 268–274. doi:10.1093/molbev/msu300.
Orr, M.C., Hughes, A.C., Chesters, D., Pickering, J., Zhu, C.-D., Ascher, J.S., 2021. Global Patterns and Drivers of Bee Distribution. Current Biology, 31(3), 451–458. https://doi.org/10.1016/j.cub.2020.10.053
Pisanty, G., Richter, R., Martin, T., Dettman, J., Cardinal, S., 2022. Molecular phylogeny, historical biogeography and revised classification of andrenine bees (Hymenoptera: Andrenidae). Molecular Phylogenetics and Evolution, 170, 107151. doi:10.1016/j.ympev.2021.107151.
Sless, T.J.L., Branstetter, M.G., Gillung, J.P., Krichilsky, E.A., Tobin, K.B., Straka, J., Rozen, J.G., Freitas, F. V., Martins, A.C., Bossert, S., Searle, J.B., Danforth, B.N., 2022. Phylogenetic relationships and the evolution of host preferences in the largest clade of brood parasitic bees (Apidae: Nomadinae). Molecular Phylogenetics and Evolution, 166, 107326. doi:10.1016/j.ympev.2021.107326.
Smith, S.A., O’Meara, B.C., 2012. treePL: Divergence time estimation using penalized likelihood for large phylogenies. Bioinformatics, 28(20), 2689–2690. doi:10.1093/bioinformatics/bts492.