Look up information

Martin Westgate & Dax Kellie

2023-11-07

show_all() & search_all()

As of galah 1.5.0, there are two simplified functions to look up information: show_all() and search_all().

These are individual functions that are able to return all types of information in one place, rather than using specific sub-functions to look up information.

For example, to show all available Living Atlases supported:

show_all(atlases)
## # A tibble: 11 × 4
##    region         institution                                                  acronym url  
##    <chr>          <chr>                                                        <chr>   <chr>
##  1 Australia      Atlas of Living Australia                                    ALA     http…
##  2 Austria        Biodiversitäts-Atlas Österreich                              BAO     http…
##  3 Brazil         Sistemas de Informações sobre a Biodiversidade Brasileira    SiBBr   http…
##  4 Estonia        eElurikkus                                                   <NA>    http…
##  5 France         Portail français d'accès aux données d'observation sur les … OpenObs http…
##  6 Global         Global Biodiversity Information Facility                     GBIF    http…
##  7 Guatemala      Sistema Nacional de Información sobre Diversidad Biológica … SNIBgt  http…
##  8 Portugal       GBIF Portugal                                                GBIF.pt http…
##  9 Spain          GBIF Spain                                                   GBIF.es http…
## 10 Sweden         Swedish Biodiversity Data Infrastructure                     SBDI    http…
## 11 United Kingdom National Biodiversity Network                                NBN     http…

To search for a specific available Living Atlas:

search_all(atlases, "Spain")
## # A tibble: 1 × 4
##   region institution acronym url                
##   <chr>  <chr>       <chr>   <chr>              
## 1 Spain  GBIF Spain  GBIF.es https://www.gbif.es

To show all fields:

show_all(fields)
## # A tibble: 646 × 3
##    id                  description               type  
##    <chr>               <chr>                     <chr> 
##  1 _nest_parent_       <NA>                      fields
##  2 _nest_path_         <NA>                      fields
##  3 _root_              <NA>                      fields
##  4 abcdTypeStatus      <NA>                      fields
##  5 acceptedNameUsage   Accepted name             fields
##  6 acceptedNameUsageID Accepted name             fields
##  7 accessRights        Access rights             fields
##  8 annotationsDoi      <NA>                      fields
##  9 annotationsUid      Referenced by publication fields
## 10 assertionUserId     Assertions by user        fields
## # ℹ 636 more rows

And to search for a specific field:

search_all(fields, "australian states")
## # A tibble: 2 × 3
##   id     description                            type  
##   <chr>  <chr>                                  <chr> 
## 1 cl2013 ASGS Australian States and Territories fields
## 2 cl22   Australian States and Territories      fields

Here is a list of information types that can be used with show_all() and search_all():

Information type Description Sub-functions
Configuration
atlases Show what living atlases are available show_all_atlases(), search_atlases()
apis Show what APIs & functions are available for each atlas show_all_apis(), search_apis()
reasons Show what values are acceptable as ‘download reasons’ for a specified atlas show_all_reasons(), search_reasons()
Taxonomy
taxa Search for one or more taxonomic names search_taxa()
identifiers Take a universal identifier and return taxonomic information search_identifiers()
ranks Show valid taxonomic ranks (e.g. Kingdom, Class, Order, etc.) show_all_ranks(), search_ranks())
Filters
fields Show fields that are stored in an atlas show_all_fields(), search_fields()
assertions Show results of data quality checks run by each atlas show_all_assertions(), search_assertions()
Group filters
profiles Show what data quality profiles are available show_all_profiles(), search_profiles()
lists Show what species lists are available show_lists(), search_lists()
Data providers
providers Show which institutions have provided data show_all_providers(), search_providers()
collections Show the specific collections within those institutions show_all_collections(), search_collections()
datasets Shows all the data groupings within those collections show_all_datasets(), search_datasets()

show_all_ subfunctions

While show_all is useful for a variety of cases, you can still call the underlying subfunctions if you prefer. These functions - with the prefix show_all_ - return a tibble doing exactly that; showing all the possible values of the category specified. These functions include:

show_all_ functions require no arguments. Simply call the function and it will return all accepted values as a tibble:

show_all_atlases()
## # A tibble: 11 × 4
##    region         institution                                                  acronym url  
##    <chr>          <chr>                                                        <chr>   <chr>
##  1 Australia      Atlas of Living Australia                                    ALA     http…
##  2 Austria        Biodiversitäts-Atlas Österreich                              BAO     http…
##  3 Brazil         Sistemas de Informações sobre a Biodiversidade Brasileira    SiBBr   http…
##  4 Estonia        eElurikkus                                                   <NA>    http…
##  5 France         Portail français d'accès aux données d'observation sur les … OpenObs http…
##  6 Global         Global Biodiversity Information Facility                     GBIF    http…
##  7 Guatemala      Sistema Nacional de Información sobre Diversidad Biológica … SNIBgt  http…
##  8 Portugal       GBIF Portugal                                                GBIF.pt http…
##  9 Spain          GBIF Spain                                                   GBIF.es http…
## 10 Sweden         Swedish Biodiversity Data Infrastructure                     SBDI    http…
## 11 United Kingdom National Biodiversity Network                                NBN     http…
show_all_reasons()
## # A tibble: 13 × 2
##       id name                            
##    <int> <chr>                           
##  1     1 biosecurity management/planning 
##  2    11 citizen science                 
##  3     5 collection management           
##  4     0 conservation management/planning
##  5     7 ecological research             
##  6     3 education                       
##  7     2 environmental assessment        
##  8    12 restoration/remediation         
##  9     4 scientific research             
## 10     8 systematic research/taxonomy    
## 11    13 species modelling               
## 12     6 other                           
## 13    10 testing

search_ subfunctions

The second subset of lookup subfunctions use the search_ prefix, and differ from show_all_ in that they require a query to work. They are used to search for detailed information that can’t be summarised across the whole atlas, and include:

Search for a single taxon or multiple taxa by name with search_taxa.

search_taxa("reptilia")
## # A tibble: 1 × 9
##   search_term scientific_name taxon_concept_id  rank  match_type kingdom phylum class issues
##   <chr>       <chr>           <chr>             <chr> <chr>      <chr>   <chr>  <chr> <chr> 
## 1 reptilia    REPTILIA        https://biodiver… class exactMatch Animal… Chord… Rept… noIss…
search_taxa("reptilia", "aves", "mammalia", "pisces")
## # A tibble: 1 × 9
##   search_term scientific_name taxon_concept_id  rank  match_type kingdom phylum class issues
##   <chr>       <chr>           <chr>             <chr> <chr>      <chr>   <chr>  <chr> <chr> 
## 1 reptilia    REPTILIA        https://biodiver… class exactMatch Animal… Chord… Rept… noIss…

Alternatively, search_identifiers is the partner function to search_taxa. If we already know a taxonomic identifier, we can search for which taxa the identifier belongs to with search_identifiers:

search_identifiers("urn:lsid:biodiversity.org.au:afd.taxon:682e1228-5b3c-45ff-833b-550efd40c399")
## # A tibble: 1 × 15
##   search_term  success scientific_name taxon_concept_id rank  rank_id   lft   rgt match_type
##   <chr>        <lgl>   <chr>           <chr>            <chr>   <int> <int> <int> <chr>     
## 1 urn:lsid:bi… TRUE    REPTILIA        https://biodive… class    3000 46718 49924 taxonIdMa…
## # ℹ 6 more variables: kingdom <chr>, kingdom_id <chr>, phylum <chr>, phylum_id <chr>,
## #   class <chr>, class_id <chr>

Sifting through the output of show_all_fields to find a specific field can be inefficient. Instead, we might wish to use search_fields to look for specific fields that match a search. As with search_taxa, search_fields requires a query to work.

search_fields("date") |> head()
## # A tibble: 6 × 3
##   id             description     type  
##   <chr>          <chr>           <chr> 
## 1 eventDate      Event Date      fields
## 2 lastLoadDate   lastLoadDate    fields
## 3 datePrecision  Date precision  fields
## 4 eventDateEnd   <NA>            fields
## 5 dateIdentified Date Identified fields
## 6 raw_eventDate  <NA>            fields

show_values() & search_values()

Once a desired field is found, you can use show_values to understand the information contained within that field, e.g.

search_all(fields, "basis") |> show_values()
## ! Search returned 2 matched fields.
## • Showing values for 'basisOfRecord'.
## # A tibble: 9 × 1
##   basisOfRecord      
##   <chr>              
## 1 Human observation  
## 2 Preserved specimen 
## 3 Observation        
## 4 Occurrence         
## 5 Machine observation
## 6 Material Sample    
## 7 Living specimen    
## 8 Material Citation  
## 9 Fossil specimen

This provides the information you need to pass meaningful queries to galah_filter.

galah_call() |> 
  galah_filter(basisOfRecord == "LIVING_SPECIMEN") |> 
  atlas_counts()
## # A tibble: 1 × 1
##    count
##    <int>
## 1 126135

This works for other types of query, such as data profiles:

search_all(profiles, "ALA") |> 
  show_values() |> 
  head()
## • Showing values for 'ALA'.
## # A tibble: 6 × 5
##      id enabled description                                              filter displayOrder
##   <int> <lgl>   <chr>                                                    <chr>         <int>
## 1    94 TRUE    "Exclude all records where spatial validity is \"false\… "-spa…            1
## 2    96 TRUE    "Exclude all records with an assertion that the scienti… "-ass…            1
## 3    97 TRUE    "Exclude all records with an assertion that the scienti… "-ass…            2
## 4    98 TRUE    "Exclude all records with an assertion that the name an… "-ass…            3
## 5    99 TRUE    "Exclude all records with an assertion that kingdom pro… "-ass…            4
## 6   100 TRUE    "Exclude all records with an assertion that the scienti… "-ass…            5