When the algorithm sees what the botanist missed

A machine learning study quietly validates indigenous taxonomy of Banisteriopsis caapi — and raises a harder question about what counts as scientific knowledge.

The paper

Biazatti and colleagues, writing in iScience (Biazatti et al., 2026, DOI: 10.1016/j.isci.2026.115753), have done something that on the surface looks modest and on closer inspection is rather more interesting than the abstract lets on. They trained six machine learning classifiers on photographs of dried leaves from herbarium specimens of Banisteriopsis caapi — the MAO-inhibiting vine without which orally administered DMT is inert — and asked whether the algorithms could recover the folk taxonomic categories used by traditional ayahuasca practitioners and the Brazilian syncretic religions.

The headline number is 70% overall classification accuracy using a support vector machine on combined adaxial and abaxial leaf images. That figure is not the story. The story is what it implies about a question that has been hanging unresolved in the ethnobotany of B. caapi for at least half a century.

The setup

Forty-seven vine plants from the UB Herbarium at the University of Brasília, yielding 384 leaves and 768 images after stringent quality filtering reduced an initial dataset of over 3,000. Nine folk types of B. caapi — Arara, Cabi, Caupuri, Caupuri-de-nós-longos, Hybrid, Ourinho, Pajezinho, Quebrador, Tucunaca — plus Diplopterys cabrerana as a phylogenetically close outgroup. Features extracted via the authors' open-source VCodeAyahuasca pipeline covered colour spaces (RGB, HSV, CIELab), shape descriptors (Hu moments), texture measures (co-occurrence matrix, LBP, HOG), and a battery of filters (Gabor, Canny, Fourier, Scharr, Laplacian, Sobel, Prewitt). Six classifiers run under 10-fold cross-validation: local KNN, Rseslib KNN, optimised forest, random forest, SVM, and DL4J.

No classifier significantly outperformed the others (ANOVA p ≥ 0.05), which is itself worth noting: the morphological signal is robust enough to be picked up across mathematically distinct architectures, rather than being an artefact of one model's particular inductive biases. Combining both leaf surfaces consistently improved accuracy over either surface alone (p = 0.0466), with abaxial information evidently contributing complementary signal — unsurprising given the venation, trichome, and gland diversity typically expressed on the underside of angiosperm leaves.

What the numbers actually say

Per-type accuracy varied dramatically: Hybrid 97%, D. cabrerana 91%, Arara 88%, Pajezinho 84%, Tucunaca 76%, Caupuri-de-nós-longos 74%, Ourinho 64%, Caupuri 63%, Cabi 47%, Quebrador 34%. A similarity network analysis resolved two clusters: a tight grouping of D. cabrerana, Hybrid, and Arara with misclassifications only among themselves, and a more entangled cluster of the remaining seven folk types with strong confusions between Ourinho–Caupuri, Cabi–Caupuri, and Quebrador–Tucunaca.

The authors are appropriately honest about the limitations. Several folk types are represented by a single plant. Cabi and Quebrador lost roughly three-quarters of their initial images to quality filtering, leaving sample sizes that almost certainly constrain what the model could learn. The classifier architectures themselves are closer to 2015 than 2026 cutting edge — hand-engineered features fed into Weka, rather than a fine-tuned vision transformer. Anyone reading this as a deployable field identifier is reading it wrong.

Why the modest result is the interesting result

The point is not the accuracy. The point is the epistemological asymmetry the paper exposes.

Folk taxonomists do not classify B. caapi by its leaves. They never have. The traditional diagnostic criteria are stem morphology (swollen nodes or smooth), bark and fibre characteristics, brew taste and density, and the qualitative character of the resulting experience. Leaves were considered uninformative — which they largely are, by direct human inspection. To the eye, one B. caapi leaf looks much like another.

What Biazatti et al. demonstrate is that the leaves nevertheless carry information about which folk category the plant belongs to. The folk classification, built from organs and properties the taxonomists were actually examining, propagates into organs and properties they were not. The categories are tracking something deep enough in the plant's biology that the signal leaks out into tissues the original classifiers ignored.

That is a particular kind of validation. If the folk system were arbitrary — if Caupuri and Tucunaca were merely cultural labels attached to plants that happened to grow in different villages, or names for slightly different ritual uses of indistinguishable biological material — the leaves would be silent. The model would perform at chance. The fact that it does not means the indigenous taxonomy is carving the plant at something close to a real joint. The shamans and feitores are not generating folklore; they are doing botany, with a different set of instruments and a different vocabulary.

This connects to a broader and more uncomfortable point about Linnaean nomenclature. Western botany classifies all of these vines as a single species, Banisteriopsis caapi Spruce ex Griseb., full stop. There is no formal sub-specific taxonomy. The folk names exist only in ethnobotanical literature and ceremonial practice; they have never been elevated to varieties, subspecies, or even formally described cultivars. To refer to these distinctions in scientific writing, researchers have to borrow indigenous terms wholesale — Oliveira et al. (2023) and now Biazatti et al. simply write "B. caapi folk type Caupuri" because there is no Latin alternative. The folk taxonomy is, at present, the only taxonomy at this resolution. Western science has yet to construct its own vocabulary for distinctions it can now verify but did not discover.

The pharmacological tail

For DMT research specifically, this matters because the pharmacology of ayahuasca is not the pharmacology of DMT alone. The β-carboline profile of the B. caapi component — the harmine, harmaline, and tetrahydroharmine ratios — modulates onset, duration, and the qualitative character of the experience. Tetrahydroharmine in particular is a weak serotonin reuptake inhibitor in its own right and is not just an MAO-A inhibitor by association.

If folk varieties correspond to morphologically distinct clusters that are stable enough to be detected by ML on leaves alone, the obvious next question is whether they also correspond to chemotypically distinct clusters. Santos et al. (2020), referenced in the paper, have already shown β-carboline profile variation across B. caapi accessions. McKenna's earlier phytochemical work suggested the same. What no one has yet done — and what is now plausibly within reach — is to run the alkaloid profiling on specimens whose folk-variety labels have been independently confirmed by a non-circular method, and ask whether the harmine-to-tetrahydroharmine ratio tracks the classification.

If it does, "ayahuasca" is not one preparation but a family of preparations, and a generation of clinical research treating it as a single pharmacological entity has been collapsing meaningful variation. The implications for everything from dose-response modelling to the interpretation of subjective effects in trials would be non-trivial. Trial protocols that source B. caapi from a single supplier without specifying folk type may have been controlling for less than they think.

The cogitronomy angle

There is something philosophically tidy about this study that ARDMT keeps returning to. The folk taxonomists generated reproducible distinctions over centuries using methods Western science has historically dismissed — direct experiential engagement, oral transmission, ceremonial use as a discriminating context. Biazatti and colleagues, using a method that has no cultural commitments and cannot taste the brew, recover roughly 70% of those distinctions from a body part the original taxonomists were not even looking at.

That is not a story about machine learning vindicating science over folklore. It is a story about two different epistemic systems converging on the same underlying biological signal from opposite directions. The folk system arrived first, with more sophisticated diagnostic criteria (stems, brews, effects). The computational system arrives later, with cruder data (dried leaves only) but with no skin in the game. When they agree, what we have learned is not that the algorithm is clever. We have learned that the original observers were paying attention to something real.

The honest framing the authors land on — leaf morphology alone insufficient, future work should integrate stem traits, anatomy, chemistry, genetics — is exactly right. But the methodological modesty should not obscure the size of the philosophical claim sitting under the experiment. Traditional ecological knowledge, in this case, generated empirically testable hypotheses about plant taxonomy that survive blind validation by methods developed in total ignorance of the originating tradition. That is not "complementary to" science. That is science, performed by people who would not have used the word.

Marginalia

One suspects the obvious next study is already being designed. Take specimens whose folk-variety labels the SVM has confirmed at high confidence — Arara, Hybrid, Pajezinho — and run the phytochemistry. Quantify the β-carbolines. Ask whether the alkaloid profiles cluster the same way the leaves do. If they do, the chain of evidence is closed: folk category → leaf morphology → alkaloid chemistry → pharmacological effect, with each link independently verifiable.

The patient remaining question, of course, is what happens when this methodology gets pointed at the other plants of the ayahuasca brew complex, and at the other ceremonial taxonomies of the Amazon. Diplopterys cabrerana itself has folk varieties. So does Psychotria viridis. So, almost certainly, do iboga, peyote, San Pedro, and every other ceremonially significant plant where traditional specialists distinguish kinds that Linnaean systematics flattens. There is a great deal of latent botany sitting in oral traditions, waiting for someone to ask the right question of it.

Also worth a glance — and ARDMT has now covered this in a dedicated note — is the Madrid group's recently formalised work on DMT in the 6-OHDA rat model of Parkinson's disease (PMID 42128256), reporting neuroprotective and neurorestorative effects in a unilateral lesion model.

Machine learning validates indigenous folk taxonomy of the ayahuasca vine