Data Paper |
Corresponding author: Wesley Tack ( wesley.tack@plantentuinmeise.be ) Academic editor: Manuel Luján
© 2022 Wesley Tack, Henry Engledow, Nuno Veríssimo Pereira, Christian Amani, Steven P. Bachman, Patricia Barberá, Henk J. Beentje, Gaël U. D. Bouka, Martin Cheek, Ariane Cosiaux, Gilles Dauby, Petra De Block, Corneille E. N. Ewango, Eberhard Fischer, Roy E. Gereau, Serene Hargreaves, Yvette Harvey-Brown, Davy U. Ikabanga, Edouard Ilunga wa Ilunga, James Kalema, Peris Kamau, Olivier Lachenaud, Quentin Luke, Ithe Mwanga Mwanga, Sydney T. Ndolo Ebika, Jacques Nkengurutse, Aimable Nsanzurwimo, Salvator Ntore, Sophie L. Richards, Reddy Shutsha Ehata, Murielle Simo-Droissart, Tariq Stévart, Marc S. M. Sosef.
This is an open access article distributed under the terms of the CC0 Public Domain Dedication.
Citation:
Tack W, Engledow H, Veríssimo Pereira N, Amani C, Bachman SP, Barberá P, Beentje HJ, Bouka GUD, Cheek M, Cosiaux A, Dauby G, De Block P, Ewango CEN, Fischer E, Gereau RE, Hargreaves S, Harvey-Brown Y, Ikabanga DU, Ilunga wa Ilunga E, Kalema J, Kamau P, Lachenaud O, Luke Q, Mwanga Mwanga I, Ndolo Ebika ST, Nkengurutse J, Nsanzurwimo A, Ntore S, Richards SL, Shutsha Ehata R, Simo-Droissart M, Stévart T, Sosef MSM (2022) The ECAT dataset: expert-validated distribution data of endemic and sub-endemic trees of Central Africa (Dem. Rep. Congo, Rwanda, Burundi). PhytoKeys 206: 137-151. https://doi.org/10.3897/phytokeys.206.77379
|
In this data paper, we present a specimen-based occurrence dataset compiled in the framework of the Conservation of Endemic Central African Trees (ECAT) project with the aim of producing global conservation assessments for the IUCN Red List. The project targets all tree species endemic or sub-endemic to the Central African region comprising the Democratic Republic of the Congo (DR Congo), Rwanda, and Burundi. The dataset contains 6361 plant collection records with occurrences of 8910 specimens from 337 taxa belonging to 153 genera in 52 families. Many of these tree taxa have restricted geographic ranges and are only known from a small number of herbarium specimens. As assessments for such taxa can be compromised by inadequate data, we transcribed and geo-referenced specimen label information to obtain a more accurate and complete locality dataset. All specimen data were manually cleaned and verified by botanical experts, resulting in improved data quality and consistency.
Africa, conservation, data capture, data cleaning, endemics, flora, flowering plants, geographic range, herbarium, IUCN Red List, threatened
The alarming rate of biodiversity loss worldwide has increased the need to conduct conservation assessments for the International Union for Conservation of Nature (IUCN) Red List of Threatened Species (
The IUCN Red List assessment procedure uses numerical thresholds within five criteria to classify taxa according to their relative risk of extinction. Criterion B (restricted geographic range) is the most frequently used for plants (
The first and perhaps most time-consuming task in preparing conservation assessments under criterion B is to obtain a realistic view of a taxon’s past and current distribution. For tropical plants, this is generally derived from herbarium specimens. Although large-scale digitisation programmes have increased the availability of digital biodiversity data, the specimen data are far from complete, up-to-date, accurate, or clean (
Here, we provide a high-quality, expert-validated occurrence dataset compiled by the ECAT project, which is part of the larger Global Tree Assessment (GTA) coordinated by Botanic Gardens Conservation International (
Conservation of Endemic Central African Trees (ECAT) through IUCN Red Listing and Species Distribution Modelling.
Funding for the ECAT project was provided by the Franklinia Foundation, with a substantial in-kind contribution from Meise Botanic Garden and Missouri Botanical Garden.
Central Africa, as defined in this study, covers a total of 2.4 million square km, comprising the countries of DR Congo, Rwanda, and Burundi and stretching from a narrow coastal strip at the western border of DR Congo (excluding the Cabinda enclave) to the montane region of the Albertine Rift. The core of this region consists of the Congo Basin, which is the second largest tropical forest area in the world after the Amazon Basin, with much of the area being at low elevation (below 600 m). The natural vegetation of the Congo Basin is classified as Guineo-Congolian rainforest on well-drained sites, with swamp forest on hydromorphic soils (
Based on data available at Meise Botanic Garden, supplemented with data from the BGCI GlobalTreeSearch (
We differentiated between endemic and sub-endemic taxa based on their spatial distribution relative to the land borders of DR Congo, Rwanda, and Burundi. We considered 219 taxa as Central African true endemics as their current distribution range is restricted to DR Congo (186 taxa), Rwanda (3), Burundi (2), or a combination of these three countries (28). The remaining 128 taxa from our list were deemed sub-endemic to Central Africa. For 116 of these sub-endemic taxa, all herbarium specimens in our dataset originated from the area delineating DR Congo, Rwanda, and Burundi, extended by a 5-degree buffer zone. For the remaining 12 sub-endemic taxa in our study, most specimens were from Central Africa (70–94%), with only a few collected outside the 5-degree buffer zone (1–23%).
We retrieved the specimen data for these taxa and their synonyms from our institutional collection database (BR; all herbarium acronyms according to
Transcription of specimen labels is often restricted to selected data fields due to resource constraints. As a result, a considerable amount of descriptive information relevant to Red List assessments may be missing from specimen databases. To enrich our data, we transcribed specimen label data focusing mainly on gaps in the locality description, habitat, and elevation. The newly transcribed data allowed us to geo-reference several specimens without coordinates and improve the geo-referencing accuracy of others. Although recent herbarium specimens increasingly contain accurate coordinates captured in real-time using a GPS device, this is not the case for the bulk of the Central African collection at BR that predated GPS devices. It was often possible to infer the geographic coordinates from the transcribed data using historical topographic maps and gazetteers or by checking the collector’s itinerary. Specimen records that could not be geo-referenced because the locality description was missing, too vague, or unclear (e.g., illegible handwriting) were removed from the dataset.
The dataset was checked for any spatial errors through an iterative series of inspections. First, we used the R package CoordinateCleaner version 2.0–18 (
Finally, the expert botanists carrying out the Red List assessments verified all occurrences for each taxon, paying particular attention to spatial outliers that could indicate an error in a specimen’s identification or geo-referencing. Verification of taxonomic identification involved physically examining the herbarium specimens or at least online checking of an image scan where applicable. Not only did the experts detect (and rectify where possible) taxonomic and geographic errors, but they also identified unsuitable records, like those belonging to cultivated specimens or specimens that were locally extinct (e.g., due to habitat loss). Including such records in the calculation of the EOO or number of locations could result in an underestimation of extinction risk (
The initial raw dataset contained data from 9956 specimens. The majority of these (83.4%) were deposited in the herbarium of Meise Botanic Garden, underlining its importance for the flora of Central Africa. Other herbaria represented in the dataset are B, BM, BRLU, C, COI, EA, EALA, EPU, FHO, GENT, H, HBG, IEC, IUK, K, KAW, KISA, LBV, LG, LISC, LISU, LSHI, LUKI, LWI, M, MA, MB, MHU, MO, MPU, NDO, NHR, NHT, P, PRE, SRGH, UPS, W, WAG, and YBI.
As part of the data enrichment, we transcribed locality data for 690 specimens, habitat data for 2802 specimens, and elevation data for 3796 specimens (this includes values indicating that information is ‘known to be unknown’). One-third of the specimens (33.1%) had no coordinates. During the ECAT project, 2923 specimens were geo-referenced, leaving 372 without spatial data. The new coordinates were derived mainly from maps and gazetteers; only for a small number of them (374) could they be copied from duplicates. After several quality checks on the geo-referencing, we adjusted the coordinates for 1774 specimens. For three-quarters of them, it concerned a relatively minor adjustment moving the occurrence up to 10 km. For the remaining quarter, this exceeded 10 km (up to as much as 3915 km). The taxonomic identification was updated for 509 specimens (changes due to synonymy or misspellings not taken into account). We removed 1046 specimens from the dataset on taxonomic or spatial grounds, leaving 8910 specimens in the cleaned dataset. After merging all duplicate specimens, we obtained a dataset with 6361 geo-referenced plant collection records.
The ECAT dataset contains distribution data of 337 taxa at specific or infraspecific level (subspecies or variety) belonging to 153 genera in 52 families and 20 orders. The family classification follows APG IV (
Kingdom: Plantae.
Division: Magnoliophyta.
Class: Magnoliopsida.
Order: Apiales, Arecales, Asterales, Boraginales, Brassicales, Celastrales, Ericales, Fabales, Gentianales, Geraniales, Lamiales, Laurales, Magnoliales, Malpighiales, Malvales, Myrtales, Proteales, Rosales, Santalales, Sapindales.
Family: Achariaceae, Anacardiaceae, Annonaceae, Apocynaceae, Araliaceae, Arecaceae, Asteraceae, Bignoniaceae, Boraginaceae, Burseraceae, Capparaceae, Celastraceae, Chrysobalanaceae, Clusiaceae, Combretaceae, Dichapetalaceae, Dipterocarpaceae, Ebenaceae, Euphorbiaceae, Fabaceae, Hypericaceae, Lamiaceae, Lauraceae, Linaceae, Malpighiaceae, Malvaceae, Melastomataceae, Meliaceae, Melianthaceae, Moraceae, Myrsinaceae, Myrtaceae, Octoknemaceae, Pandaceae, Pentaphylacaceae, Phyllanthaceae, Picrodendraceae, Pittosporaceae, Proteaceae, Putranjivaceae, Rhamnaceae, Rhizophoraceae, Rosaceae, Rubiaceae, Rutaceae, Salicaceae, Santalaceae, Sapindaceae, Sapotaceae, Scytopetalaceae, Thymelaeaceae, Violaceae.
Common names: flowering plants.
The occurrence data are relatively well distributed, albeit unevenly over the study area (Fig.
Spatial coverage: Number of specimens (total: 8910) and number of records (total: 3631) per country.
Country | No. specimens | No. records |
---|---|---|
Democratic Republic of the Congo | 7750 | 5329 |
Rwanda | 413 | 353 |
Uganda | 186 | 184 |
Burundi | 172 | 150 |
Republic of the Congo | 143 | 123 |
Zambia | 70 | 66 |
Gabon | 44 | 41 |
The United Republic of Tanzania | 37 | 37 |
Central African Republic | 36 | 31 |
Cameroon | 32 | 22 |
Angola | 21 | 20 |
South Sudan | 4 | 3 |
Equatorial Guinea (mainland) | 2 | 2 |
Total | 8910 | 6361 |
15°49'20"S to 06°31'00"N latitude; 08°48'00"E to 38°30'00"E longitude.
The ECAT dataset includes specimens collected between 1882 and 2019, with 206 records not having a date (Fig.
For a tropical region to be considered reasonably well-known botanically (vascular plants), a rule of thumb is that the minimal level of botanical exploration should be at least 100 specimens per 100 km2 (
Object name: Darwin Core Archive ECAT: Endemic and sub-endemic Central African Trees.
Character encoding: ISO-8859-1.
Format name: Darwin Core Archive format.
Format version: 1.5.
Distribution: https://zenodo.org/record/7007770.
Publication date of data: 2022-08-18.
Licenses of use: Creative Commons Attribution (CC-BY) 4.0 License.
Metadata language: English.
Date of metadata creation: 2022-08-18.
Hierarchy level: Dataset.
Provided fields: language, institutionCode, collectionCode, basisOfRecord, occurrenceID, catalogNumber, recordNumber, recordedBy, georeferenceVerificationStatus, occurrenceStatus, disposition, associatedReferences, otherCatalogNumbers, occurrenceRemarks, materialSampleID, eventDate, year, month, day, habitat, eventRemarks, continent, country, countryCode, stateProvince, locality, verbatimElevation, locationRemarks, decimalLatitude, decimalLongitude, geodeticDatum, coordinateUncertaintyInMeters, verbatimCoordinates, identificationRemarks, scientificName, kingdom, phylum, class, order, family, genus, specificEpithet, infraspecificEpithet, taxonRank, taxonRemarks.
We acknowledge all those who have contributed to the development of the ECAT dataset, with special thanks to Ann Bogaerts, Israel Borokini, Luís Catarino, Helen Chadburn, Sara Contu, Rogier de Kok, Sofie De Smedt, Mathias Dillen, Ryan Hills, Lucia Lopez Poveda, Barbara Mackinder, Pierre Meerts, Malin Rivers, and Xander van der Burgt. We also gratefully acknowledge Craig Hilton-Taylor from the IUCN Red List Unit and the IUCN SSC Global Tree Specialist Group for verifying our taxon list and for their guidance in submitting all data to the IUCN Species Information Service (SIS). This work was made possible thanks to funding provided by the Franklinia Foundation.