Flora-On: Occurrence data of the vascular flora of mainland Portugal

Abstract The Flora-On dataset currently includes 253,310 occurrence records for the class Embryopsidae (vascular plants), comprising data collated via the platform http://flora-on.pt/ relating to observation records of vascular plants across mainland Portugal. Observations are uploaded directly to the database primarily by experienced botanists and naturalists, typically on a weekly basis, and consist of geo-referenced data points for species (or infraspecific taxa) along with their date of observation and phenological state. The Flora-On project aims to compile and make publicly accessible chorological, ecological, morphological and photographic information for the entire vascular flora of Portugal. The project’s website offers powerful query and visualization capabilities, of which we highlight the probabilistic bioclimatic and phenological queries which operate based on the empirical density distributions of species in those variables. Flora-On was created and continues to be maintained by volunteers who are Associate members of Sociedade Portuguesa de Botânica (Botanical Society of Portugal). Given its focus on research-grade and current data, the Flora-On project represents a significant contribution to the knowledge of the present distribution and status of the Portuguese flora.


Project title
Flora-On, Interactive Flora of Portugal

Miguel Porto (Programmer)
Funding Th e project does not have direct funding from any source, the platform being entirely built and maintained by volunteers. Maintenance costs of the web server are covered by the Associate membership fees of Sociedade Portuguesa de Botânica (Botanical Society of Portugal). However, externally funded projects have contributed through the provision of data.

Study area description
Portugal is located at the south westernmost extent of Europe ( Figure 1) and is bound by the Atlantic Ocean to the west and south and by Spain to the north and east. Being approximately rectangular in shape, Portugal extends circa 220 km from east to west and 550 km from north to south. It lies within the Mediterranean biogeographic region, with the vast majority of its land falling within the Mediterranean macrobioclimate but extends in the north into the temperate macrobioclimate. Th e area of mainland Portugal is approximately 89,015 km 2 and, together with mainland Spain (492,127 km 2 ), forms a geographically well-defi ned territory known as the Iberian Peninsula.
Th e orography of Portugal is heterogeneous, particularly from north to south, with the Mountains and Plains of the Iberian northwest and of the Iberian Central System dominating the northern parts of its territory (Pereira et al. 2014). Th is region is characterised by rugged landscapes dominated by granitic and metasedimentary geological formations which extend almost as far south as river Tejo (the largest river in Portugal which divides the country in half). Serra da Estrela mountain, its highest peak, rises 1,991 m above sea level. Th e Plains of the southwest Iberian Peninsula occupy almost all the central and southern interior territory (Pereira et al. 2014), this region being characterised by a very smooth landscape with some scattered high relief formations, such as Serra de São Mamede (1027 m), Serra de Monchique (902 m), and Serra do Caldeirão (589 m). Th e western and southern coastal regions are otherwise occupied by the Mesozoic and Cenozoic Basins (Pereira et al. 2014) and are characterised by the dominance of maritime and alluvial sedimentary formations and calcareous reliefs, some of which are Across mainland Portugal the vegetation is mainly Mediterranean in terms of both its structure and fl oristic composition. Semi-deciduous and perennial oak woodlands, "montado", shrublands, grasslands and silvo-agricultural systems occupy most of this area. Mainland Portugal supports approximately 2,900 native vascular plant taxa (Sequeira et al. 2010), 137 of which are considered endemic.

Design description
Th e Flora-On project aims to compile and make publicly accessible chorological, ecological, morphological and photographic information of the entire vascular fl ora of Portugal. Occurrence data is regularly uploaded to the website by active collaborators, typically on a weekly basis, and consists of geo-referenced data points of species (or infraspecifi c taxa) along with their date of observation and phenological state. Additionally, other research projects contributed data to the project from their exhaustive sampling campaigns. An open-source version of the platform is currently under development and can be found at https://github.com/miguel-porto/fl ora-on-server/ Th e strength of the Flora-On platform lies in its ability to execute a diversity of query types (Table 1). In addition to the usual deterministic queries in relation to taxonomic, morphological and geographical information, Flora-On enables users to

Type of query Example queries (meaning) Returns
Morphological arbusto espinhoso, fl ores amarelas (spiny shrub, yellow fl owers) All taxa that present the specifi ed attribute combination Bioclimatic range continentalidade>14 (continentality index greater than 14) All taxa whose cumulative density distribution in each variable, within the specifi ed ranges, is greater than a given threshold tempminima: 1.4-3.7, precipitação: 1300-1900 (minimum temperature between 1.4 and 3.7°C and annual precipitation between 1,300 and 1,900 mm) Bioclimatic similarity tempmaxima~Cistus albidus (maximum temperature profi le similar to that of Cistus albidus) All taxa whose density distribution of the observations in the specifi ed variable is similar to that of the specifi ed species, given a threshold of similarity Geographical variable ranges altitude<100, costa>100000 (altitude lower than 100 m and distance to coast greater than 100 km) All taxa whose cumulative density distribution in each variable, within the specifi ed ranges, is greater than a given threshold Taxa with a distribution similar to that of Staehelina dubia, computed as the intersection of their density distributions, given a threshold of similarity Phenological range 20 julho a 9 agosto (20 July to 9 August) All taxa whose cumulative density distribution of fl owering dates within the specifi ed range is greater than a given threshold, i.e., taxa whose fl owering period is concentrated within the specifi ed range Phenological similarity fl oração~Scilla autumnalis (fl owering profi le similar to that of Scilla autumnalis) All taxa whose density distribution of fl owering dates is similar to that of Scilla autumnalis, given a threshold of similarity Phenological precise date 7 fevereiro (7 February) All taxa which may be found in fl ower at the given date (regardless of the fl owering distribution throughout the year) Area of occupancy quadriculas<3 (less than 3 UTM squares) All taxa that occur only in less than three 10×10 km UTM squares conduct quantitative probabilistic species queries in relation to bioclimatic distribution, altitudinal distribution and fl owering dates. With such queries species can be fi ltered and ranked by the degree of matching criteria defi ned by the user for one or more quantitative variables (including fl owering date). Th is innovative feature is based upon empirical density distributions of species that are computed internally for each variable (Figure 2), and for the precise observed fl owering dates, using a kernel smoother. Th e Figure 2. Internal structure, data fl ow and front-end interfaces of Flora-On. Pink boxes represent the front-end interfaces that interact with the user (input and/or output). Green boxes represent the data, either permanent or temporary (dashed box). Blue boxes represent the internal server-side algorithms that parse the user queries, process and summarise the raw data, and deliver the results to the front-end interfaces. density distributions are stored in the database as binary objects to allow fast querying with MySQL native extensions.
Flora-On is designed in such a way that the results of any type of query, irrespective of its complexity, can be visualized across diff erent facets, evidencing aggregated bioclimatic, geographical, phenological or morphological features of the species that match the query. All queries can be expressed through plain text, but to simplify the querying process for general users, four front-end graphical query interfaces are provided to aid query building (Figure 2, top row). Th e query algorithm, after passing and processing the input query (Figure 2, middle row), then delivers the results to the output modules of the application, which summarise and display the query results according to the diff erent facets (Figure 2, bottom row): I) Th e standard search displays the species photographs ordered by diff erent criteria; II) Th e bioclimatic explorer displays jointly the occurrences of the species that match the query in a bioclimatic/environmental space, with the possibility of overlaying multiple queries in the same plots, evidencing the ecological diff erences between species or groups of species (http://fl ora-on.pt/#b); III) Th e WebGIS displays (with the ability to download the output) the map of the number of species that match the query per Universal Transverse Mercator (UTM) square, e.g. richness of spiny species, richness of summer-fl owering species, richness of species occurring in less than fi ve UTM squares, etc. (http://fl ora-on.pt/#w); IV) Th e multi-way interactive identifi cation key allows users to identify species by iteratively narrowing down possible species, freely choosing its way through a set of characters. Displayed characters are adjusted for each iteration according to the list of possible species, and are highlighted according to their discriminant power, to enhance the effi ciency of the identifi cation process (http://fl ora-on.pt/#z); V) Th e joint fl owering profi le displays the aggregated fl owering profi le of all species matching the query, e.g. fl owering profi le of the species occurring in areas with an annual precipitation above 1,500 mm (currently accessible through the standard search results).  (11.3%); Poales (10.3%); Fabales (8.6%); Caryophyllales (6.5%); Asparagales (5.8%); Malvales (4.6%); Apiales (3.7%); Rosales (3.4%); Malpighiales (3.1%); Ericales (3%); and Fagales (2.7%).

Data published through
In total, this dataset includes occurrence records for 150 plant families and 2073 taxa (Figure 3). Families with the greatest numbers of species in Portuguese mainland are also those families with the greatest number of occurrence records within this dataset, including: Asteraceae (32,638); Fabaceae (21,529); and Poaceae (20,656); although some genera are still under-represented. Th is is probably due to the nature of the dataset, given that the greatest part of the contributions results from non-exhaustive fi eld observations which likely result in the under-representation of the more inconspicuous taxa, or taxa diffi cult to identify in the fi eld.

General spatial coverage
Th e Flora-On dataset covers almost the entire territory of mainland Portugal, although there remains a signifi cant lack of information for some areas, particularly in the central and southern interior regions ( Figure 4a). As expected for a dataset that is not complete, the number of species per UTM square (Figure 4b) correlates to the number of records, illustrating the gap in information for some regions. Indeed, whilst the project presently includes occurrence data spread across the whole country, the more intensively surveyed areas include the more coastal regions and some key areas towards the interior. Nonetheless, it is worth noting the high numbers of species of some 10x10 km UTM squares, up to 700 observed species in some cases, revealing high taxonomic diversity of some parts of the Portuguese mainland territory (Figure 4b).
Th e number of Portuguese endemic species recorded per UTM square ( Figure 5a) illustrates a well defi ned pattern, with the highest endemic species richness occurring across the central and southern coastal regions, including Lisbon and Setúbal, coast of Alentejo, and Algarve (from Sagres to Faro). Th ese areas exhibit particularly isolated climatic and/or geological features, such as the wet coastal mountains of Sintra and Monchique, the inland sand plains of Setúbal, the Atlantic coastal cliff s, and the vast dry limestone regions of Setúbal and Algarve. Additionally, the data illustrates the importance of the mountain ranges in the interior north and the regions nearby the frontier in the northeast quadrant as areas of high Iberian endemic species richness (excluding Portuguese endemics) per UTM square (Figure 5b). Figure 5 further illustrates some coincidence between areas with high Portuguese endemic richness and high Iberian endemic richness. Coordinates 36°43'12"N and 42°10'12"N Latitude; 9°37'12"W and 6°9'36"W Longitude

Temporal coverage
Although the bulk of the dataset corresponds to observations made between 1 January 1995 and 2 February 2016 (Figure 6), historic records prior to this period also exist.

Collection name
Flora-On: Interactive Flora of Portugal

Method step description
For each occurrence, GPS coordinates are recorded by the collaborators wherever possible; otherwise approximate coordinates and their level of precision are recorded. Plants are identifi ed at least to species level. Th ereafter, collaborators upload the data to the Flora-On database via a webmapping interface or by uploading a record table.
Coordinates are then generalised to the UTM 10×10 km grid and are made publicly available for download as tables and as a geographical layer through a WFS service: http://fl ora-on.pt/wfs. High resolution data can be provided upon request, subject to approval by Sociedade Portuguesa de Botânica and involved collaborators.
Study extent description: Th is dataset includes observations falling within mainland Portugal, most of which were made after the Flora-On platform was made available online (25 February 2012).
Sampling Description: A large proportion of the records corresponds to nonexhaustive observations of collaborators, although a signifi cant amount of data results from fi eldwork completed as part of other externally funded projects. When possible, plants are identifi ed in the fi eld at least to species level. Otherwise, plant material is collected and identifi cation is confi rmed in the lab by the collaborators. Phenological state is recorded if plants are fl owering at the time of observation.

Quality Control Description
Taxon nomenclature is fully controlled via use of a reference checklist, allowing neither spelling errors nor outdated synonyms. Th e reference checklist includes only currently accepted nomenclature which corresponds to an updated version of the "Checklist da Flora de Portugal (Continental, Açores e Madeira)" (http://ipt.gbif.pt/ipt/resource. do?r=alfa_checklist_fl orapt).
Th e responsibility of species identifi cation rests with the collaborators, most of which have expertise in plant identifi cation. Additionally, the Editorial Board of Flora-On is committed to ensure a high reliability of uploaded data, hence checking regularly for unlikely or doubtful occurrence records, and asking collaborators to provide pictures, descriptions or specimens whenever needed. Th e Editorial Board estimates at least 95% of the records to be correctly identifi ed under the most up-todate nomenclature.

Datasets
Th e current Flora-On dataset published through GBIF includes occurrence and phenological data. Phenological data, for now, is limited to a 'Yes'/'NA' fi eld in respect of fl owering, and is linked to the date of the observation. In addition, Flora-On also utilises a morphological dataset not currently published elsewhere, as well as a number of other quantitative data fi elds that numerically describe the fl owering period, bioclimatic and altitudinal distribution profi les of each taxon (Figure 2).
Morphological data is a compilation of information from diff erent bibliographic sources (Castroviejo 1986-2015, Franco 1971-1984, Franco and Rocha-Afonso 1994-2003 and from direct observation in the fi eld, which includes ca. 15 categorical reproductive and vegetative plant traits, such as colour and number of petals, type of fruit and type of growth. Th e primary purpose of this trait data was to aid the general public on the identifi cation of taxa, but it is part of the roadmap to enrich the dataset with more traits and make it freely available. Species altitude profi les and bioclimatic profi les are estimated by applying a kernel density on elevation and bioclimatic data, i.e. the set of elevation and bioclimatic variable values at which each taxon was observed. Elevation data is extracted by crossing taxa occurrence data with the ASTER Global Digital Elevation Model (METI, NASA). Bioclimatic data are extracted from the climatic variables and bioclimatic indices compiled and developed by Monteiro-Henriques et al. (2015).
Th e Flora-On dataset represents a major contribution to the knowledge of the present distribution of Portuguese and Iberian fl ora. Despite the lack of information in several parts of the territory, Flora-On dataset constitutes the most complete and up to date source of research-grade occurrence data on the Portuguese fl ora, since a great concern is put on ensuring the correctness of the data. Other existing nationwide platforms covering occurrence data of the Portuguese fl ora have either a partial coverage or do not specifi cally target validated data. Furthermore, previous data on the Portuguese fl ora was limited to herbarium and bibliographic sources, which are largely not digitally accessible or accessible only in a very coarse format.
Finally, the Flora-On project has been stimulating the collection of new data on the distribution of species, which has resulted in great improvements in the knowledge of many species. Indeed, the voluntary fi eld work conducted by the collaborators has signifi cantly improved the knowledge about the current status of many rare, protected by national and international legislation, or hardly known species, and several new species not known to occur in Portugal have been recently found.

Dataset description
Object name: Darwin Core Archive Flora-On: occurrence data of the fl ora of mainland Portugal