Taxonomic filtering

Callum Waite, Shandiya Balasubramanium

2023-10-13

Taxonomic complexity can confound the process of searching, filtering, and downloading records using galah, but there are a few ways to ensure records are not missed.

library(galah)
library(dplyr)
galah_config(email = "your_email_here", verbose = FALSE)

search_taxa()

search_taxa() enables users to look up taxonomic names before downloading data, which allows for disambiguating homonyms and checking that the search term matches the taxon name in the ALA . search_taxa() returns the scientific name, authorship, rank, and full classification for the taxon matched to the provided search term.

search_taxa("Petroica boodang") |> gt::gt()
search_term scientific_name scientific_name_authorship taxon_concept_id rank match_type kingdom phylum class order family genus species vernacular_name issues
Petroica boodang Petroica (Petroica) boodang (Lesson, 1838) https://biodiversity.org.au/afd/taxa/a3e5376b-f9e6-4bdf-adae-1e7add9f5c29 species exactMatch Animalia Chordata Aves Passeriformes Petroicidae Petroica Petroica boodang Scarlet Robin noIssue
# Muscicapa chrysoptera is a synonym for the Flame Robin, Petroica phoenicea
# Guniibuu is the Yuwaalaraay Indigenous name for the Red-Capped Robin, Petroica goodenovii
search_taxa("Muscicapa chrysoptera", "Guniibuu") |> gt::gt()

search_term scientific_name scientific_name_authorship taxon_concept_id rank match_type kingdom phylum class order family genus species vernacular_name issues
Muscicapa chrysoptera Petroica (Littlera) phoenicea Gould, 1837 https://biodiversity.org.au/afd/taxa/fe74e658-4848-437a-a23d-f1001a198552 species exactMatch Animalia Chordata Aves Passeriformes Petroicidae Petroica Petroica phoenicea Flame Robin noIssue
Guniibuu Petroica (Petroica) goodenovii (Vigors & Horsfield, 1827) https://biodiversity.org.au/afd/taxa/10dbd908-00f3-4ec2-9a9c-a2fd4782eaf1 species vernacularMatch Animalia Chordata Aves Passeriformes Petroicidae Petroica Petroica goodenovii Red-capped Robin noIssue

Where homonyms exist, search_taxa() will prompt users to clarify the search term by providing one or more taxonomic ranks in a tibble. This example differentiates among the genus Morganella in three kingdoms:

search_taxa("Morganella") |> gt::gt()
## Warning: Search returned multiple taxa due to a homonym issue.
## ℹ Please provide another rank in your search to clarify taxa.
## ℹ Use a `tibble` to clarify taxa, see `?search_taxa`.
## ✖ Homonym issue with "Morganella".
search_term issues
Morganella homonym
search_taxa(tibble(kingdom = "Fungi", genus = "Morganella")) |> gt::gt()

search_term scientific_name scientific_name_authorship taxon_concept_id rank match_type kingdom phylum class order family genus issues
Fungi_Morganella Morganella Zeller https://id.biodiversity.org.au/node/fungi/60091999 genus exactMatch Fungi Basidiomycota Agaricomycetes Agaricales Agaricaceae Morganella noIssue

galah_identify()

galah_identify() is similar to search_taxa(), except that it can be used within a piped workflow to retrieve counts, species, or records e.g.

galah_call() |>
  galah_identify("Petroica boodang") |>
  atlas_counts() 
## # A tibble: 1 × 1
##    count
##    <int>
## 1 118909
galah_call() |>
  galah_identify("Muscicapa chrysoptera", "Guniibuu") |>
  atlas_species() |> 
  gt::gt()
kingdom phylum class order family genus species author species_guid vernacular_name
Animalia Chordata Aves Passeriformes Petroicidae Petroica Petroica (Petroica) goodenovii (Vigors & Horsfield, 1827) https://biodiversity.org.au/afd/taxa/10dbd908-00f3-4ec2-9a9c-a2fd4782eaf1 Red-capped Robin
Animalia Chordata Aves Passeriformes Petroicidae Petroica Petroica (Littlera) phoenicea Gould, 1837 https://biodiversity.org.au/afd/taxa/fe74e658-4848-437a-a23d-f1001a198552 Flame Robin
galah_call() |>
  galah_identify(tibble(kingdom = "Fungi", genus = "Morganella")) |>
  atlas_occurrences() |>
  head() |> 
  gt::gt()
recordID decimalLatitude decimalLongitude eventDate scientificName taxonConceptID dataResourceName occurrenceStatus
001ec30d-3376-4f63-ba32-b48bc3dd137d -33.66218 150.2708 2021-04-10 Morganella purpurascens https://id.biodiversity.org.au/node/fungi/60092001 NSW BioNet Atlas PRESENT
02ba39cd-8077-4868-a0ff-e50765089788 -41.28082 174.7943 NA Morganella compacta NZOR-6-128055 New Zealand Virtual Herbarium PRESENT
0422009d-c1f0-4e2e-8d29-e88df6de2049 -36.44838 174.6714 NA Morganella compacta NZOR-6-128055 New Zealand Virtual Herbarium PRESENT
04aeeb8d-0538-477c-aff1-29574eafa349 -36.84225 174.4695 1993-06-19 Morganella compacta NZOR-6-128055 New Zealand Virtual Herbarium PRESENT
092b6f3e-ef27-4cbf-bb28-b172c1b200c5 -26.77940 152.8803 2009-02-19 Morganella purpurascens https://id.biodiversity.org.au/node/fungi/60092001 National Herbarium of Victoria (MEL) AVH data PRESENT
0be95d59-29a4-4475-8586-a497cac607f7 -43.15002 171.7305 NA Morganella compacta NZOR-6-128055 New Zealand Virtual Herbarium PRESENT


We recommend using search_taxa() prior to calling galah_identify() in a workflow to confirm the validity of the provided search term. Setting search = TRUE passes the results of search_taxa() to galah_identify(), which also speeds up the query.

robins <- search_taxa("Muscicapa chrysoptera", "Guniibuu") 

galah_call() |>
  galah_identify(robins$taxon_concept_id, search = FALSE) |>
  atlas_counts()
## # A tibble: 1 × 1
##    count
##    <int>
## 1 191277


galah_filter()

galah_filter() subsets records by searching for exact matches to an expression, and may also be used for taxonomic filtering e.g.

galah_call() |>
  galah_filter(species == "Petroica boodang") |>
  atlas_counts()
## # A tibble: 1 × 1
##    count
##    <int>
## 1 118909
aus_petroica <- c("Petroica boodang", "Petroica goodenovii", 
                  "Petroica phoenicea", "Petroica rosea",
                  "Petroica rodinogaster", "Petroica multicolor")

galah_call() |>
  galah_filter(species == aus_petroica) |>
  galah_group_by(species, vernacularName) |>
  atlas_counts() |> 
  gt::gt()
species vernacularName count
Petroica boodang Scarlet Robin 118733
Petroica goodenovii Red-capped Robin 110021
Petroica phoenicea Flame Robin 81256
Petroica rosea Rose Robin 52301
Petroica rodinogaster Pink Robin 13361
Petroica multicolor Norfolk Island Robin 6574

This can be useful in searching for paraphyletic or polyphyletic groups, which cannot be done using galah_identify(). For example, to get counts of non-chordates:

galah_call() |>
  galah_filter(kingdom == "Animalia", phylum != "Chordata") |>
  galah_group_by(phylum) |>
  atlas_counts() |>
  head() |> 
  gt::gt()
phylum count
Arthropoda 8070909
Mollusca 1274134
Annelida 313177
Cnidaria 274188
Echinodermata 190004
Porifera 130139

galah_filter(), galah_identify(), and taxonomic ranks

Deciding between using galah_filter() and galah_identify() in a query comes down to how a record has been classified, and whether or not you have the correct unique name and classification of the taxa of interest.

The ALA has fields for the primary taxonomic ranks (kingdom, phylum, class, order, family, genus, species) and some secondary ranks (e.g. subfamily, subgenus), all of which may be used with galah_filter() and galah_identify(). Additionally, there is a field named scientificName, which refers to the lowest taxonomic rank to which a record has been identified e.g.

galah_call() |>
  galah_identify(tibble(genus = "Pitta")) |>
  galah_group_by(scientificName, taxonRank) |>
  atlas_counts() |>
  filter(!is.na(scientificName)) |>
  gt::gt()
scientificName taxonRank count
Pitta (Pitta) versicolor Species 25996
Pitta (Pitta) iris Species 5716
Pitta (Erythropitta) Subgenus 724
Pitta (Pitta) versicolor versicolor Subspecies 302
Pitta (Erythropitta) erythrogaster Species 190
Pitta Genus 72
Pitta (Pitta) iris iris Subspecies 70
Pitta (Pitta) versicolor intermedia Subspecies 42
Pitta (Pitta) versicolor simillima Subspecies 36
Pitta (Pitta) iris johnstoneiana Subspecies 27
Pitta (Erythropitta) erythrogaster digglesi Subspecies 21

If, for instance, you have the correct species or subspecies name, then searching for matches against the species and subspecies fields, respectively, will provide more precise results. This is because the field scientificName may include subgenera. If you’ve used search_taxa() to get the ALA-matched name of a taxon and only want records identified to a particular level of classification, searching for matches against scientificName is recommended.

Paraphyletic or polyphyletic groups may contain taxa identified to different taxonomic levels. In this case, it is simpler to use search_taxa() and galah_identify() rather than galah_filter(). In the example below, search_taxa() matches terms to one genus, three species, and two subspecies. This can then be used in a piped workflow with galah_identify().

tas_endemic <- c("Sarcophilus", # Tasmanian Devil
                 "Bettongia gaimardi", # Tasmanian Bettong
                 "Melanodryas vittata", # Dusky Robin
                 "Platycercus caledonicus",# Green Rosella
                 "Aquila audax fleayi", # Tasmanian Wedge-Tailed Eagle
                 "Tyto novaehollandiae castanops") # Tasmanian Masked Owl

search_taxa(tas_endemic) |> gt::gt()
search_term scientific_name scientific_name_authorship taxon_concept_id rank match_type kingdom phylum class order family genus species vernacular_name issues
Sarcophilus Sarcophilus Cuvier, 1837 https://biodiversity.org.au/afd/taxa/06455b77-7d50-4ec7-9122-8ab48cfb0c1c genus exactMatch Animalia Chordata Mammalia Dasyuromorphia Dasyuridae Sarcophilus NA NA noIssue
Bettongia gaimardi Bettongia gaimardi (Desmarest, 1822) https://biodiversity.org.au/afd/taxa/19c9bfdf-0fbd-4b1b-a7c4-64498290d059 species exactMatch Animalia Chordata Mammalia Diprotodontia Potoroidae Bettongia Bettongia gaimardi Tasmanian Bettong noIssue
Melanodryas vittata Melanodryas (Amaurodryas) vittata (Quoy & Gaimard, 1830) https://biodiversity.org.au/afd/taxa/0f04889f-5489-4369-a545-8a041fba9f6d species exactMatch Animalia Chordata Aves Passeriformes Petroicidae Melanodryas Melanodryas vittata Dusky Robin noIssue
Platycercus caledonicus Platycercus (Platycercus) caledonicus (Gmelin, 1788) https://biodiversity.org.au/afd/taxa/c6e478fe-f199-463f-8576-a77108fd73e2 species exactMatch Animalia Chordata Aves Psittaciformes Psittacidae Platycercus Platycercus caledonicus Green Rosella noIssue
Aquila audax fleayi Aquila (Uroaetus) audax fleayi Condon & Amadon, 1954 https://biodiversity.org.au/afd/taxa/ac93f7f0-0686-4589-801a-5832378cb7c1 subspecies exactMatch Animalia Chordata Aves Accipitriformes Accipitridae Aquila Aquila audax NA noIssue
Tyto novaehollandiae castanops Tyto novaehollandiae castanops (Gould, 1837) https://biodiversity.org.au/afd/taxa/2c30d58b-572b-4dab-8644-b222c28eb0ec subspecies exactMatch Animalia Chordata Aves Strigiformes Tytonidae Tyto Tyto novaehollandiae NA noIssue
galah_call() |>
  galah_identify(tas_endemic) |>
  galah_group_by(scientificName) |>
  atlas_counts() |>
  arrange(scientificName) |>
  gt::gt()
scientificName count
Aquila (Uroaetus) audax fleayi 4896
Bettongia gaimardi 1162
Bettongia gaimardi cuniculus 41
Bettongia gaimardi gaimardi 9
Melanodryas (Amaurodryas) vittata 14104
Melanodryas (Amaurodryas) vittata kingi 15
Melanodryas (Amaurodryas) vittata vittata 34
Platycercus (Platycercus) caledonicus 43285
Platycercus (Platycercus) caledonicus brownii 24
Platycercus (Platycercus) caledonicus caledonicus 33
Sarcophilus 3
Sarcophilus harrisii 36219
Tyto novaehollandiae castanops 62