The FLORIVON flora survey in the Netherlands between 1902 and 1950

Abstract In 1902, the nationwide citizen science project, known as FLORIVON, for mapping the flora of the Netherlands was launched, resulting in the publication of a complete flora atlas in 1980. Until 2004, the atlas dataset of the fieldwork between 1902 and 1950 had only been partly digitised and observations were aggregated and anonymised. Between 2001 and 2018, the dataset has been entirely digitised from the original field forms, including notes on non-native taxa. This paper presents key characteristics and figures of the dataset and provides an overview of the historical survey project, the digitisation process and subsequent validation of the data. The dataset is currently curated in the National Database Flora and Fauna and published in GBIF.

Starting from the autumn of 1902 until 1907, small numbers of distribution maps were published on an irregular basis to show participants the progress of the work. From 1908 to 1923, only a few participants continued their work, mainly during the so-called Unio summer meetings of the Dutch Botanical Society (Smit and Verschoof 1980). In 1924, a new group of botanists, led by J.L. van Soest and J.G. Sloff, continued the mapping project (Verschoof 1978). Another group, led by W.C. de Leeuw, focused on mapping the changes in the flora after the construction of the Afsluitdijk, a dam that caused the Zuiderzee to transform from a salt water body into the a freshwater lake (Westhoff 1964).
In 1930, the IVON foundation (Institute for Vegetation Research in the Netherlands) was founded by J.W.C. Goethart and aimed to unite all botanists working on plant surveys. Between 1930 and 1939, many grid cells were surveyed and preliminary maps were compiled and published in several journals (e.g. Sloff 1935). During and after World War II, the survey project slowed down. Although the project never formally ended, 1950 could be considered as the final year of the field surveys.
It was only in 1980 that the data were compiled into an atlas of the flora of the Netherlands with maps on 5 × 5 km spatial resolution. The atlas was produced by J. Mennema and co-workers at the National Herbarium in Leiden (Mennema et al. 1980).
In 1988, FLORON was founded as a spin-off from the National Herbarium to continue the vascular plant surveys by volunteers and build a database by digitising distribution data of vascular plants. At first, the Atlas of the Flora of the Netherlands (published in 1980) was digitised to have quick access to historical distribution maps. Between 2001 and 2018, all original field forms, opportunistic observations on handwritten notes, letters, vegetation relevées and literature data were digitised by Joop van Heeswijk and compiled into the FLORIVON dataset which is described in this paper.

Sampling protocol
The basis of the survey scheme was a map of grid cells 1.3 × 1.01 km covering the Netherlands. Grid cells were assigned to participants by the project organisation. Each grid cell was then surveyed for several hours to one day aiming to make a complete list of all wild vascular plants occurring in the area. Survey data were recorded on field forms with abbreviations of scientific taxon names printed on them (Fig. 1). Nomenclature followed the second edition of the Prodromus Florae Batavae (Vuyck 1901). Additionally, miscellaneous observations, vegetation relevées and literature records (from 1832 until 1953) were submitted on special forms or in handwritten or typewritten letters. Most observations include the grid cell code, taxon name, date and up to 9 names of co-observers (Table 1). In total, 56,103 forms were digitised, of which 47,060 were field survey forms, 8,279 written notes and 764 vegetation relevées. The average number of taxa per form was 47. Most of the field forms contained higher numbers of taxa, while written notes usually reported only 1-5 taxa (Table 2). A B Figure 1. Samples of the FLORIVON survey forms: A field form showing a square with a drawing of the surveyed area, space for writing down grid square code ('hokje'), location name, observer name(s) and date, followed by two pages of taxon abbreviations that surveyors had to cross out after observation B written note with header data containing the grid cell code (e.g. L6.12.31), observer's name and the survey date. Stamps confirm that the data have been included in printed atlas volumes.  1900-1909 1910-1919 1920-1929 1930-1939  Data processing and quality control Survey forms were digitised using Turboveg (Hennekens and Schaminée 2001), a computer programme usually used for handling phytosociological relevées, with customised species dictionaries matching the taxonomy and nomenclature of the field survey forms. All additional written information on the forms, including additional taxon names of, for example, non-native taxa, additional survey dates and remarks were temporarily included in the Turboveg header record and extracted afterwards. Taxon names were mapped to current names using a translation table between the Prodromus Flora Batavae (Vuyck 1901) and a more recent checklist of vascular plants in the Netherlands (Groen et al. 1999). The original taxon name or its abbreviation is kept in the database. Grid cells codes were translated to geographical coordinates. Observer names were mapped to existing observer identifiers in the National Database Flora and Fauna.
Records without an observation date were assigned to the entire survey period of 1902-1950. Records without a valid taxon name or missing grid cell codes were omitted from the final dataset. Records with locations entirely outside the country or in the sea were also omitted. A total of 5,530 records were cleaned. The number of digitised observations after this first data cleaning step was 2,638,919.
Validation of the digitised observations was performed with an automated procedure which involved trying to find a match for each observation in a dataset, based on printed volumes of the Atlas of the Flora of the Netherlands (Mennema et al. 1980) and other digitised literature and collection records in the National Database Flora and Fauna, which had been validated in the past.
In the FLORIVON dataset, 142,838 observations did not match validated data sources and were considered for a manual review. Of the remaining unmatched observations, 110,889 records of common taxa were validated, i.e. taxa occurring in 30% or more of the 5 × 5 km grid squares in the Netherlands. A total of 2,415 records of less common taxa were validated if they were present in neighbouring grid cells. Further unmatched records, rare taxa, were validated by Gerard Dirkse by plotting them on a map for visual in-terpretation (17,427 observations). These observations were validated if they matched the geographical pattern of all other valid observations of the taxon. Herbarium specimens and publications mentioning an observation were also taken into account during validation. In the validation process, 12,107 out of 142,838 records were deleted (ca. 3%).
The validated dataset was added to the NDFF Verspreidingsatlas (http://www. verspreidingsatlas.nl), which is the platform FLORON uses to curate datasets. Simultaneously, the dataset was published through the GBIF Integrated Publishing Toolkit (IPT).

Personnel
Joop van Heeswijk performed the digitisation between 2001 and 2018 as voluntary work. Laurens Sparrius performed the validation of the dataset. Gerard Dirkse assisted with the validation of non-native and doubtful taxa. Naturalis Biodiversity Center (Leiden) is hosting the physical archive with field forms and notes.
Excluded information: Complete observer biographies, source type (field list, publication, specimen, vegetation relevée), location names and remarks were not included in the published dataset, but can be found in the source (curation) database, which can be accessed with the link below. This information was excluded due to privacy reasons or because it was deemed irrelevant. The dataset contains distribution data of 2502 taxa at species or intraspecific level divided over 138 plant families. The plant families with the most observations in the dataset belong to the Asteraceae and Poaceae (Table 3).
Some taxa in FLORIVON are currently accepted as lumped taxa, which makes it impossible to compare taxon distributions for certain taxa (Table 4).

Temporal coverage
The dataset contains observations and literature data from 1832 to 1953. Most of the data were collected between 1902 and 1950 as part of the FLORIVON citizen science project (Fig 2).

Spatial coverage
The dataset covers the entire country of the Netherlands as it was in the period 1902-1950. At that time, the southern part of the province of Flevoland did not yet exist (Hoeksema 2007). Additionally, minor changes were made to the border with Germany and Belgium after World War II (Wijchgel 2008). Survey data were collected in small grid cells of 1.3 × 1.01 km (kwartierhok) (Fig. 3), 16 of which can be combined into a larger grid cell of 5.0 × 4.167 km (uurhok), which is used on some forms. The grid system was created in 1902 by the botanical community itself because, until 1920, a km grid was lacking on the topographical military maps. These grid cells differ from the currently used grid, in which the smallest grid cells are 1 × 1 km and follow the Dutch National Coordinate Reference System (ESPG: 28992).
The periods before and after 1925 show different patterns of survey intensity, which should be taken into account when using the data for further analysis (Fig. 4).