This week I wanted students to learn more about phylogeography:
Manica Balant, Daniel Vitales, Zhiqiang Wang*, Zoltán Barina, Lin Fu, Tiangang Gao, Teresa Garnatje, Airy Gras, Muhammad Qasim Hayat, Marine Oganesian, Jaume Pellicer, Seyed A. Salami, Alexey P. Seregin, Nina Stepanyan-Gandilyan, Nusrat Sultana, Shagdar Tsooj, Magsar Urgamal, Joan Vallès, Robin van Velzen, Lisa Pokorny. 2025. “Integrating target capture with whole genome sequencing of recent and natural history collections to explain the phylogeography of wild-growing and cultivated Cannabis”. Plants People Planet. 1-18. https://doi.org/10.1002/ppp3.70043 [* = equal contributions]
This study examines wild and cultvated Cannabis, which is according to the paper has been used “as fibre (ropes, fabric and paper), medicinally (over 200 recorded uses), as food (nutrient-rich seeds) and in various magico-religious rituals”. It uses many samples from around the globe and genomic data to examine population structure.
One thing that has stood out in this paper and others considered for the course is the now apparent standard toolset of using mafft, Structure, ASTRAL-III, and IQ-TREE2. The age of some of the software is unexpected: the latest release of Structure is from over 13 years ago, and for ASTRAL-III it’s over five years ago. That’s not necessarily a bad thing – it’s not like hackers are trying to take over the banking system by exploiting potential vulnerabilities in unpatched phylogenetic software – it’s just surprising. You would think that especially for popular analytical questions there would be new innovations or just new hotness. For ASTRAL the developers “encourage using the new code” of ASTER, though as this only was published in a peer-reviewed journal in July 2025 it’s reasonable that papers we’re reading now don’t use it yet (and I doubt answers under ASTRAL-III are wrong). But I do worry as researchers continue to use ChatGPT and similar for analyses and ask “what’s the problem in doing stats with an (AI) consultant?” popular approaches will be baked into the training data and then will be continually suggested as the options to use, even when papers come out showing problems with classic approaches (and see all the studies STILL looking at net diversification rate through time) or new approaches unlock new questions. There’s already a bias towards that built into the field – it’s more efficient to use approaches one already knows (and you might know the limitations well, too, which is important), and all the tutorials or posts are about the classic software, but at least new info is constantly being put into our human brains, where we could have a preference towards recent approaches (though on R-sig-phylo there was a counterargument that the agentic AIs will learn about new approaches and suggest them). Though I guess one advantage of fossilization of methods advice is we’ll all start using MacClade again.
One thing I loved about the paper was the use of herbarium and fresh specimens. It’s yet another example of the benefits of repositories of biological information (recognized at many, but not all, places) as well as expertise to collect in the field. The paper also demonstrated the feasibility of genomic scale data.
A shift in thinking I’ve undergone but still feels unnatural is how in modern evolutionary biology a key workflow step is massively deleting data. It wasn’t that long ago we were looking at chromatograms in Sequencher to identify every possible base by eye (while still excluding bad reads); in contrast, in this paper the data were filtered from 68,212 single nucleotide polymorphisms (SNPS) down to just 2,875. This is important to do for data from GBIF (“nope, that oak is not from the ocean, someone flipped a sign”) and things like the TRY database of plant traits, too. This paper was useful for discussing this shift with students.
A conclusion from the paper was that Cannibis sativa is one species (as hypothesized by Linnaeus) and not two with the addition of C. indica. Perhaps a useful anecdote for intro bio when the standard move is to dunk on “Lamarckism” (though this requires more nuance than the cartoon version): the describer of the incorrectly split C. indica: Jean-Baptiste Lamarck. Though one student raised a good point: it would be interesting to redo the Structure analyses using only the wild plants, not escaped or domesticated cultivars – one possibility is that human meddling allowed interbreeding of C. sativa and C. indica that would have been relatively reproductively isolated otherwise.
One note for those who might want to teach the paper: I made sure to let students know that some uses of the plant are illegal in this jurisdiction (true at both our state and federal level) before asking them to provide information (for discussion questions on the paper before class, for example). I don’t want students trying to, say, make a joke, put in writing something that could be read by others as disclosure of illegal drug use – who knows how such info could be scraped into systems and misinterpreted in the future.
I made intro slides with some of my background material and some figures from the paper: PDF and PowerPoint.
[Note that this blog post is dated for the class date, but I’m actually pushing it on Oct. 11, 2025]
To subscribe, go to https://brianomeara.info/blog.xml in an RSS reader.
Citation
@online{o'meara2025,
author = {O’Meara, Brian},
title = {PhyloPapers 2025, {Phylogeography}},
date = {2025-09-17},
url = {https://brianomeara.info/posts/phylopapers_2025_Sep_17/},
langid = {en}
}