Review Article |
Corresponding author: Maria S. Vorontsova ( m.vorontsova@kew.org ) Academic editor: Sandy Knapp
© 2015 Maria S. Vorontsova, Derek Clayton, Bryan K. Simon.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Vorontsova MS, Clayton D, Simon BK (2015) Grassroots e-floras in the Poaceae: growing GrassBase and GrassWorld. PhytoKeys 48: 73-84. https://doi.org/10.3897/phytokeys.48.7159
|
GrassBase and GrassWorld are the largest structured descriptive datasets in plants, publishing descriptions of 11,290 species in the DELTA format. Twenty nine years of data compilation and maintenance have created a dataset which now underpins much of the Poaceae bioinformatics. GrassBase and GrassWorld can continue to grow productively if the proliferation of alternative classifications and datasets can be brought together into a consensus system. If the datasets are reconciled instead of diverging further apart a long term cumulative process can bring knowledge together for great future utility. This paper presents the Poaceae as the first and largest model system for e-taxonomy and the study of classification development in plants. The origin, development, and content of both datasets is described and key contributors are noted. The challenges of alternative classifications, data divergence, collaborative contribution mechanisms, and software are outlined.
Classification, DELTA, grasses, e-taxonomy, Scratchpads
Grasses and floristic knowledge. Grasses (Poaceae) are undisputedly the most economically important family of flowering plants (www.fao.org), including wheat (Triticum aestivum L.), rice (Oryza sativa L.), maize (Zea mays L.), sugar cane (Saccharum officinarum L.), bamboos, forage grasses, and lawn grasses. As a group so fundamental to the human civilisation the grasses have always been well studied and the cumulative body of knowledge on the circa 12,000 species (
DELTA as a standardization tool for descriptions. The idea of storing standardised species descriptions as a database which could be automatically converted to text was first developed by Mike Dallwitz for insects (
How GrassBase grew.Poaceae Flora treatments written at the Kew Herbarium provided the starting material for Derek Clayton to compile the family generic conspectus Genera Graminum (
GrassBase content. The March 2014 release of GrassBase includes 64,213 Poaceae names published at species rank or below, descriptions of 11,313 accepted species, and 713 genera (
GrassBase maintenance. New Poaceae names recorded by the International Plant Names Index (
Holes in GrassBase. The original compilation of GrassBase was an ambitious pioneering project aiming to demonstrate the feasibility and usefulness of an electronic flora in comparison to a printed book. Timely project completion was a priority and much was inevitably omitted from the database design. Descriptions lack source attributions and no references are provided for the synonymy. Hybrids are not included. The original set of names in GrassBase was taken from IPNI which did not list infraspecific taxa prior to 1970; the focus of GrassBase remains on the species-level so numerous subspecies and varieties are not included except as synonyms of their species. When a new species is published in a genus not accepted by GrassBase it is moved to the GrassBase genus under a provisional unpublished new combination to maintain the consistency of generic concepts across the database, a process which has created approximately 150 accepted species names which do not correspond to current usage and have not been validly published. There are no plans to publish these names and they have been omitted from all derivative data sets. Every new name recorded by IPNI is reviewed by Derek Clayton before incorporation into GrassBase, but publications with changes to synonymy which do not publish a nomenclatural novelty are unfortunately not always noted. The consistency of the dataset on the global scale reflects the available knowledge and literature which is far from uniform, both in terms of the information provided by individual descriptions and the coverage and age of treatments for different parts of the world.
Grasses in many languages, reclassified, with pictures. The GrassWorld project was started by Bryan K. Simon in 2003 to build on GrassBase and gather together all information on the world’s grasses within the DELTA system, to enable the user to query any data type via INTKEY (
Grasses in a Scratchpad. The development of the collaborative platform Scratchpads (
GrassWorld development. GrassWorld continues to grow as new data is imported from GrassBase and published literature and online resources are added. GrassWorld and part of AusGrass2 have been supported by Bryan Simon in the absence of project funding and unfortunately the development of GrassWorld in its present form will not continue beyond ca 2020. A future merger of the GrassBase and GrassWorld data may be the best option for preserving data.
Divergence of grass classifications. The purpose of GrassBase was originally defined as a “practical catalogue of identifiable taxa, stable and conservative, a flora which the database seeks to emulate”, in contrast to “a phylogeny according to the latest theory; volatile and not always practical”. The GrassBase classification follows Genera Graminum (
How far have GrassBase and GrassWorld diverged? An estimated 10% of species in GrassBase have generic placements different from those in GrassWorld, Catalogue of New World Grasses and the forthcoming
Divergence of other name databases. Edited versions of GrassBase data contribute to the complexity of taxonomic datasets in the grasses. A copy of the GrassBase SYNON Access database name data made in 2006 has provided the Poaceae data for the World Checklist of Selected Plant Families (
Can divergent datasets become a consensus classification? The curator of each database decides which names are accepted. With some 12,000 species the Poaceae are too large for one person to hold in-depth knowledge across the family: in GrassBase a considerable part of the decision making is carried out by data curators who are not taxonomic group specialists. The work of scanning new publications and making decisions on the accepted names is carried out by a different person for each database, sometimes producing a confusing diversity of taxonomic opinions. Considerable resources are spent by users trying to decide which classification to adopt and which names are correct. Developing a process of working towards a single consensus opinion could bring benefits: direct changes to the classification by taxon specialists, as well as time saving for data curators and for users. Database maintenance and the translation of new species descriptions into DELTA format are time consuming tasks which could be distributed between different people.
How do we collaborate towards a consensus? Compilers of any dataset cannot fail to introduce unintentional biases reflecting their areas of expertise. Regional floristic specialists are commonly in disagreement with taxonomic group specialists regarding species delimitation. Taxonomists in biodiversity-rich countries can lack adequate internet connections to view the outputs of e-taxonomy, let alone participate. Engaging the full range of people who can provide useful information for species-level descriptions is challenging. Considerable resources are needed to incorporate published information into an electronic dataset. Data contributors should be fully acknowledged and have ownership of their contributions, while data quality and consistency still needs to be maintained across the dataset. The area of collaborative e-taxonomy is still in development.
Lack of consensus at species-level? The global community is now broadly in agreement regarding subfamilies and tribes of the Poaceae (
Grass Genera of the World: data incompatibility challenge. While alternative classifications are a frequent focus of debate, divergent and incompatible datasets are arguably a greater concern when viewed in the context of long term information accrual. The Grass Genera of the World DELTA dataset (
Bamboos in GrassBase. The GrassBase character set was designed with specimen identification as a primary consideration, and some morphological terminology specific to the bamboos was altered for compatibility with non-bamboo grasses, with advice from Christopher M. A. Stapleton. This has enabled the use of INTKEY to distinguish between bamboos and other grasses, but has created discrepancies between terminology in GrassBase and that used in bamboo specialist literature (e.g.
The software challenge. The DELTA software suite has been in development for over 30 years and lacks full functionality under 64-bit Windows (
The web integration challenge. Multiple contributor e-taxonomy websites, where multiple users in different countries are able to edit the same website simultaneously, publish descriptions as plain text. DELTA datasets are translated into text prior to web publication and the reverse process of obtaining DELTA code from text descriptions is currently not possible. The DELTA system lacks multiplatform interoperability while
Growing an e-flora into a multipurpose e-infrastructure platform. The original name for GrassBase was “World Grass Flora” to reflect its design as a database equivalent of a traditional flora: a species inventory and an identification guide. GrassWorld has started to expand the range of information available. If these resources are to integrate with the modern eBiosphere online and contribute effectively to the World Flora Online (Global Strategy for Plant Conservation, http://www.plants2020.net) a radical modernisation of web presentation will be necessary, including links to observation data, plant ontogenies, and provision of machine readable data output.
This paper argues that a collaborative approach and careful thought across the Poaceae taxonomic community are needed to take grass e-taxonomy forwards. Failure to plan and collaborate could lead to an increasing proliferation of contradictory classifications. Unique datasets of great value could be rendered obsolete if software development and database maintenance does not keep pace with technology platforms. Investment in community database integration and infrastructure could unlock untapped research and data mining potential of many historic datasets. The rich data and the long history of database compilation in the grasses present an unprecedented opportunity to study the development of classifications and to develop e-taxonomic models. The authors would like to invite potential collaborators to discuss dataset improvements and future plans.
Mike Lazarides (CSIRO) initiated and supported the start of GrassBase. We would like to thank Dave Simpson (Kew) for project steering, Helen Williamson for early work on GrassBase, Kehan Harman for software development on both GrassBase and GrassWorld, and Nick Black and Michael Bradford (Kew) for software support. Many thanks to Daniel Healy and Yucely Alfonso for the data work on GrassWorld, Irina Brake for Scratchpad work, Dimitris Koureas and Isa Vandevelde for work with Scratchpads 2, and Philip Sharpe, Hildemar Scholz, Philipe Morat, and Gilberto Ocampo for translations. Also many thanks to Mary Barkworth (Intermountain Herbarium), Terry MacFarlane (Western Australian Herbarium), Rob Soreng (Smithsonian Institution), Lynn Clark (Iowa State), and Elizabeth Kellogg (Donald Danforth Plant Science Center) for discussion and support, and to Fernando Zuloaga and Donat Agosti for reviewing the manuscript. Many thanks also to everyone who has contributed corrections to GrassBase: please continue sending them in.
Aims of GrassBase as summarised by Derek Clayton in 2012.
Grassbase is a natural extension of Genera Graminum and the big Regional Floras produced at Kew, expanding them to a global treatment at species-level, and exploring the concept of a continuously updated e-Flora for a family of some 11000 species. Its design is based, firstly, on defining its intended applications; secondly on identifying all the retrieval and exploratory tasks commonly undertaken in the course of these applications, then building the database to service these tasks. Applications and tasks are outlined below:
As a public Flora. Floras are the essential reference manuals underpinning all plant science. Grassbase emulates their traditional content, embracing nomenclature, description, identification and distribution.
Tasks. Display the classification currently adopted at Kew. Identify specimens. Clarify nomenclature. Investigate plant geography.
As a source for expediting the production of local Floras or Field Keys for public use.
Tasks. List the local flora. Deliver chunks of text for pasting and editing on an external file. Write keys.
As a workspace and toolkit for research on the family’s morphological classification from species to tribal level. It involves detection and resolution of inconsistencies in nomenclature, identity or relationships, and reconciliation with external datasets such as DNA.
Tasks. Enable the user to rummage for relevant information throughout the system seeking solutions to problems. Assist comprehension of the overall classification. Support the hatching and testing of innovative ideas.