APCalign: an R package workflow and app for aligning and updating flora names to the Australian Plant Census
Elizabeth H. Wenk A * , William K. Cornwell A , Anne Fuchs B , Fonti Kar A , Anna M. Monro B , Hervé Sauquet A C , Ruby E. Stephens A D and Daniel S. Falster AA
B
C
D
Abstract
Here we present ‘APCalign’, an R package and accompanying browser-sourced application to align and update scientific names for Australian vascular plants to the most likely currently accepted name in the Australian Plant Census (APC) or a name in the Australian Plant Names Index (APNI). Scientific names are the label assigned to unique taxon concepts by the scientific community, but this common terminology is most useful if a taxon concept is consistently referred to by the same name. These links can be broken because of either spelling mistakes or taxonomic changes. Automated tools are required to resolve taxon lists, aligning and updating long lists of possibly erroneous scientific names to the most likely currently accepted names. It is essential that tools specific to the APC/APNI be developed, because these lists specify an endorsed national-level nomenclature used in government legislation and include the uniquely Australian concept of phrase names, absent in global taxonomic datasets. To align input names to names within the APC or APNI, ‘APCalign’ works progressively through a sequence of checks that combine different permutations of the input name, exact versus fuzzy matches, matches that consider the entire name input versus a subset of words, and character strings that indicate a name can be resolved only to a genus or family. The aligned names are then, when possible, updated to a currently accepted taxon concept within the APC. This package should facilitate all research outputs that require diverse scientific name lists to be merged or outdated lists to be updated.
Keywords: Australian Plant Census, biodiversity informatics, bioinformatics, conservation biology, plant taxonomy, R-package, taxon concept, vascular plants.
References
Barker WL (2005) Standardising informal names in Australian publications. Australian Systematic Botany Society Newsletter 122, 11-12.
| Google Scholar |
Borsch T, Berendsohn W, Dalcin E, Delmas M, Demissew S, Elliott A, Fritsch P, Fuchs A, Geltman D, Güner A, Haevermans T, Knapp S, le Roux MM, Loizeau P-A, Miller C, Miller J, Miller JT, Palese R, Paton A, Parnell J, Pendry C, Qin H-N, Sosa V, Sosef M, von Raab-Straube E, Ranwashe F, Raz L, Salimov R, Smets E, Thiers B, Thomas W, Tulig M, Ulate W, Ung V, Watson M, Jackson PW, Zamora N (2020) World Flora Online: placing taxonomists at the heart of a definitive and comprehensive global resource on the world’s plants. TAXON 69, 1311-1341.
| Crossref | Google Scholar |
Boyle B, Hopkins N, Lu Z, Raygoza Garay JA, Mozzherin D, Rees T, Matasci N, Narro ML, Piel WH, Mckay SJ, Lowry S, Freeland C, Peet RK, Enquist BJ (2013) The taxonomic name resolution service: an online tool for automated standardization of plant names. BMC Bioinformatics 14, 16.
| Crossref | Google Scholar | PubMed |
Chamberlain SA, Szöcs E (2013) taxize: taxonomic search and retrieval in R. F1000Research 2, 191.
| Crossref | Google Scholar |
Chamberlain S, Szoecs E, Foster Z, Arendsee Z, Boettiger C, Ram K, Bartomeus I, Baumgartner J, O’Donnell J, Oksanen J, Tzovaras BG, Marchand P, Tran V, Salmon M, Li G, Grenié M, rOpenSci (https://ropensci.org/) (2022) taxize: taxonomic information from around the web. Available at https://github.com/ropensci/taxize
Falster D, Gallagher R, Wenk EH, Wright IJ, Indiarto D, Andrew SC, Baxter C, Lawson J, Allen S, Fuchs A, Monro A, Kar F, Adams MA, Ahrens CW, Alfonzetti M, Angevin T, Apgaua DMG, Arndt S, Atkin OK, Atkinson J, Auld T, Baker A, von Balthazar M, Bean A, Blackman CJ, Bloomfield K, Bowman DMJS, Bragg J, Brodribb TJ, Buckton G, Burrows G, Caldwell E, Camac J, Carpenter R, Catford JA, Cawthray GR, Cernusak LA, Chandler G, Chapman AR, Cheal D, Cheesman AW, Chen S-C, Choat B, Clinton B, Clode PL, Coleman H, Cornwell WK, Cosgrove M, Crisp M, Cross E, Crous KY, Cunningham S, Curran T, Curtis E, Daws MI, DeGabriel JL, Denton MD, Dong N, Du P, Duan H, Duncan DH, Duncan RP, Duretto M, Dwyer JM, Edwards C, Esperon-Rodriguez M, Evans JR, Everingham SE, Farrell C, Firn J, Fonseca CR, French BJ, Frood D, Funk JL, Geange SR, Ghannoum O, Gleason SM, Gosper CR, Gray E, Groom PK, Grootemaat S, Gross C, Guerin G, Guja L, Hahs AK, Harrison MT, Hayes PE, Henery M, Hochuli D, Howell J, Huang G, Hughes L, Huisman J, Ilic J, Jagdish A, Jin D, Jordan G, Jurado E, Kanowski J, Kasel S, Kellermann J, Kenny B, Kohout M, Kooyman RM, Kotowska MM, Lai HR, Laliberté E, Lambers H, Lamont BB, Lanfear R, van Langevelde F, Laughlin DC, Laugier-Kitchener B-A, Laurance S, Lehmann CER, Leigh A, Leishman MR, Lenz T, Lepschi B, Lewis JD, Lim F, Liu U, Lord J, Lusk CH, Macinnis-Ng C, McPherson H, Magallón S, Manea A, López-Martinez A, Mayfield M, McCarthy JK, Meers T, van der Merwe M, Metcalfe DJ, Milberg P, Mokany K, Moles AT, Moore BD, Moore N, Morgan JW, Morris W, Muir A, Munroe S, Nicholson Á, Nicolle D, Nicotra AB, Niinemets Ü, North T, O’Reilly-Nugent A, O’Sullivan OS, Oberle B, Onoda Y, Ooi MKJ, Osborne CP, Paczkowska G, Pekin B, Guilherme Pereira C, Pickering C, Pickup M, Pollock LJ, Poot P, Powell JR, Power SA, Prentice IC, Prior L, Prober SM, Read J, Reynolds V, Richards AE, Richardson B, Roderick ML, Rosell JA, Rossetto M, Rye B, Rymer PD, Sams MA, Sanson G, Sauquet H, Schmidt S, Schönenberger J, Schulze E-D, Sendall K, Sinclair S, Smith B, Smith R, Soper F, Sparrow B, Standish RJ, Staples TL, Stephens R, Szota C, Taseski G, Tasker E, Thomas F, Tissue DT, Tjoelker MG, Tng DYP, de Tombeur F, Tomlinson K, Turner NC, Veneklaas EJ, Venn S, Vesk P, Vlasveld C, Vorontsova MS, Warren CA, Warwick N, Weerasinghe LK, Wells J, Westoby M, White M, Williams NSG, Wills J, Wilson PG, Yates C, Zanne AE, Zemunik G, Ziemińska K (2021) AusTraits, a curated plant trait database for the Australian flora. Scientific Data 8, 254.
| Crossref | Google Scholar | PubMed |
Franz NM, Peet RK (2009) Perspectives: towards a language for mapping relationships among taxonomic concepts. Systematics and Biodiversity 7, 5-20.
| Crossref | Google Scholar |
Gagolewski M (2022) stringi: fast and portable character string processing in R. Journal of Statistical Software 103, 1-59.
| Crossref | Google Scholar |
Garnett ST, Christidis L, Conix S, Costello MJ, Zachos FE, Bánki OS, Bao Y, Barik SK, Buckeridge JS, Hobern D, Lien A, Montgomery N, Nikolaeva S, Pyle RL, Thomson SA, van Dijk PP, Whalen A, Zhang Z-Q, Thiele KR (2020) Principles for creating a single authoritative list of the world’s species. PLOS Biology 18, e3000736.
| Crossref | Google Scholar | PubMed |
Govaerts R, Nic Lughadha E, Black N, Turner R, Paton A (2021) The World Checklist of Vascular Plants, a continuously updated resource for exploring global plant diversity. Scientific Data 8, 215.
| Crossref | Google Scholar | PubMed |
Grenié M, Berti E, Carvajal-Quintero J, Dädlow GML, Sagouis A, Winter M (2023) Harmonizing taxon names in biodiversity data: a review of tools, databases and best practices. Methods in Ecology and Evolution 14, 12-25.
| Crossref | Google Scholar |
Martín-Forés I, Guerin GR, Lewis D, Gallagher RV, Vilà M, Catford JA, Pauchard A, Sparrow B (2023) The Alien Flora of Australia (AFA), a unified Australian national dataset on plant invasion. Scientific Data 10, 834.
| Crossref | Google Scholar | PubMed |
Ooms J (2014) The jsonlite package: a practical and consistent mapping between JSON data and R objects. arXiv:14032805 [StatCO]. Available at https://arxiv.org/abs/1403.2805
Ooms J, Wickham H, R Studio (2024) curl: a modern and flexible web client for R. Available at https://jeroen.r-universe.dev/curl, https://curl.se/libcurl/
R Core Team (2024) ‘R: a language and environment for statistical computing.’ (R Foundation for Statistical Computing: Vienna, Austria) Available at https://www.R-project.org/
Richardson N, Cook I, Crane N, Dunnington D, François R, Keane J, Moldovan-Grünfeld D, Ooms J, Wujciak-Jens J, Apache Arrow (2024) arrow: integration to ‘Apache’ ‘Arrow’. Available at https://github.com/apache/arrow/
Sandall EL, Maureaud AA, Guralnick R, McGeoch MA, Sica YV, Rogan MS, Booher DB, Edwards R, Franz N, Ingenloff K, Lucas M, Marsh CJ, McGowan J, Pinkert S, Ranipeta A, Uetz P, Wieczorek J, Jetz W (2023) A globally integrated structure of taxonomy to support biodiversity science and conservation. Trends in Ecology & Evolution 38, 1143-1153.
| Crossref | Google Scholar | PubMed |
Schellenberger Costa D, Boehnisch G, Freiberg M, Govaerts R, Grenié M, Hassler M, Kattge J, Muellner-Riehl AN, Rojas Andrés BM, Winter M, Watson M, Zizka A, Wirth C (2023) The big four of plant taxonomy – a comparison of global checklists of vascular plant names. New Phytologist 240, 1687-1702.
| Crossref | Google Scholar | PubMed |
Toelken HR, Miller RT (2012) Notes on Hibbertia (Dilleniaceae) 8. Seven new species, a new combination and four new subspecies from subgen. Hemistemma, mainly from the central coast of New South Wales. Journal of the Adelaide Botanic Garden 25, 71-96.
| Google Scholar |
van der Loo MPJ (2014) The stringdist package for approximate string matching. The R Journal 6, 111-122.
| Crossref | Google Scholar |
Walker B (2021) kewr: R package to access kew data APIs. Available at https://barnabywalker.github.io/kewr
Whitbread G (2018) Taxon, taxon concept and taxon name usage: definitions and relationships (GitHub issue). Available at https://github.com/tdwg/tnc/issues/1
Wickham H (2011) testthat: get started with testing. The R Journal 3, 5-10.
| Crossref | Google Scholar |
Wickham H (2023) httr: tools for working with URLs and HTTP. Available at https://github.com/r-lib/httr
Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R, Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019) Welcome to the tidyverse. Journal of Open Source Software 4, 1686.
| Crossref | Google Scholar |
Wieczorek J, Bloom D, Guralnick R, Blum S, Döring M, Giovanni R, Robertson T, Vieglais D (2012) Darwin core: an evolving community-developed biodiversity data standard. PLoS ONE 7, e29715.
| Crossref | Google Scholar | PubMed |