Database of Vascular Plants of Canada (VASCAN): a community contributed taxonomic checklist of all vascular plants of Canada, Saint Pierre and Miquelon, and Greenland

Abstract The Database of Vascular Plants of Canada or VASCAN (http://data.canadensys.net/vascan) is a comprehensive and curated checklist of all vascular plants reported in Canada, Greenland (Denmark), and Saint Pierre and Miquelon (France). VASCAN was developed at the Université de Montréal Biodiversity Centre and is maintained by a group of editors and contributors. For every core taxon in the checklist (species, subspecies, or variety), VASCAN provides the accepted scientific name, the accepted French and English vernacular names, and their synonyms/alternatives in Canada, as well as the distribution status (native, introduced, ephemeral, excluded, extirpated, doubtful or absent) of the plant for each province or territory, and the habit (tree, shrub, herb and/or vine) of the plant in Canada. For reported hybrids (nothotaxa or hybrid formulas) VASCAN also provides the hybrid parents, except if the parents of the hybrid do not occur in Canada. All taxa are linked to a classification. VASCAN refers to a source for all name, classification and distribution information. All data have been released to the public domain under a CC0 waiver and are available through Canadensys and the Global Biodiversity Information Facility (GBIF). VASCAN is a service to the scientific community and the general public, including administrations, companies, and non-governmental organizations.


Study area description
Th e study area occupies the northern half of North America (excluding Alaska). Th e area of Canada is 9,984,670 km 2 , of Greenland (or Kulaalit Nunaat, an autonomous country within the kingdom of Denmark) 2,166,086 km 2 , and Saint Pierre and Miquelon (collectivité territoriale, France) 242 km 2 . Th e latter is 20 km off the coast of Newfoundland's Burin Peninsula and its characteristics are those of boreal eastern Canada. From west to east, the main physiographic regions are the Western Cordillera, the sedimentary Interior Plains, the Canadian and Greenland Shields (mostly igneous rocks), the sedimentary Great Lakes and St. Lawrences Lowlands, and the Appalachian Mountains. Th e sedimentary Hudson Bay Lowlands basin lies at the centre of the shield, a northern area of sedimentary plains and mountains. Th e Canadian Arctic borders the Arctic Ocean in northern Canada and northern Greenland. An ice cap covers 81% of Greenland.
Th e dominant vegetation type of the area is the boreal forest, which occupies much of Canada from Yukon and northeastern British Columbia to Newfoundland. To the north, Arctic tundra prevails: it can be divided into low Arctic (with a nearly continuous plant cover, sometimes shrubby) and high Arctic (including polar deserts); these types are the only ones found in Greenland. To the south of the boreal forest, from west to east, are the humid Pacifi c Coastal forest, the Cordilleran forest, the Prairie grasslands, the eastern temperate forests (southern Ontario and Quebec), and the Atlantic or Acadian forests.
Th e population of Canada is concentrated in a narrow belt along its border with the United States, where most of the impacts on ecosystems (urbanization, agriculture) is concentrated. Logging, mining, and hydroelectric development occur in the boreal forest, and mining is now rapidly developing in the Arctic. About 9.9% (Environment Canada 2011) of the terrestrial area of Canada is protected (7.5% according to the World Bank 2013) and 40% of Greenland. Based on the data in VASCAN, the area harbors a total of 5,124 vascular plant species, 3,829 native and 1,295 introduced (25% of the fl ora). Of the native species of Canada, 156 are considered legally at risk, with a further 34 of conservation concern (COSEWIC 2009+).

Design description
Th e goal of the Database of Vascular Plants of Canada (VASCAN) is to provide an upto-date, documented checklist of the names of vascular plants in Canada, Greenland, and Saint Pierre and Miquelon, both scientifi c and vernacular, and the distribution of the plants at the provincial/territorial level.
VASCAN was developed from the need to validate vascular plant name and distribution data from eastern Canada (Ontario and eastward), Greenland, and Saint Pierre and Miquelon for the Flora of North America project (FNA) and from the need to provide French vernacular names for taxa present in Quebec in the FNA. It expanded when Parks Canada wanted to harmonize the names from vascular plant species lists of its parks across the country. At the time we also realized that -aside from Th e Flora of Canada by Scoggan (1978Scoggan ( -1979 that was in need of updating -not only was there no standardized scientifi c name list for the country -despite worthwhile eff orts from Kartesz (1999) and USDA NRCS (2011) -but also no standardized source of Canadian English and French vernacular names. Names used for plants in English Canada are not necessarily those used in the United States, and thus U.S. sources were not always appropriate for this goal. Finally, several national organizations, such as Parks Canada, Forest Canada, the Committee on the Status of Endangered Wildlife in Canada (COSEWIC), and NatureServe Canada, expressed the need for a web-based list of Canadian taxa, with data on provincial/territorial distribution.

Taxonomic coverage
Th is checklist covers all vascular plants ( Equisetopsida , Tracheophyta ) reported in the area described in the section 'Spatial Coverage' (Figure 1). Th e core taxa considered are species, subspecies or varieties, and their hybrids. For these taxa, we provide synonyms, the accepted and alternative French and English vernacular names, and the habit (tree, shrub, herb and/or vine) of the plant in Canada. For reported hybrids (nothotaxa or hybrid formulas) we also indicate the hybrid parents, except if the parents of the hybrid do not occur in Canada. Th is core information is not provided for higher taxa, although the calculated distribution based on lower taxa can be consulted and downloaded from the VASCAN website (http://data.canadensys.net/vascan).
All taxa are linked to a classifi cation: Chase and Reveal (2009) for the higher classifi cation, Christenhusz et al. (2011a) for lycophytes, Smith et al. (2006) for monilophytes (modifi ed in Rothfells et al. 2012), Christenhusz et al. (2011b) for the gymnosperms, and the Angiosperm Phylogeny Group (2009) for fl owering plants. At the generic level and below, the Flora of North America Editorial Committee (1993+) is the main source of classifi cation, unless taxonomic literature more recent than the volume published for a given taxon provides a taxonomy more refl ective of current data. Th e source used is indicated for each taxon in the dataset.
Th e classifi cation includes 16 ranks. Th ey are, in hierarchical order: class, subclass, superorder, order, family, subfamily, tribe, subtribe, genus, subgenus, section, subsection, series, species, subspecies and variety. Varieties within subspecies are accepted, so quadrinomial names are present, but forms are not included.

Common names
Vascular plants, Lycopods, ferns, conifers, fl owering plants. In the dataset, French and English vernacular names are provided for families, species, subspecies, and varieties.

Spatial coverage
Th e checklist covers all vascular plants reported in Canada, Greenland (Denmark), and Saint Pierre and Miquelon (France) (Figure 2). Th e latter two regions are added because their fl oras are intimately related to that of Canada and it is useful for Canadians and others to know about them. Provincial distributions are provided to help Canadians visualize the relationship among the fl oras of their provinces and territories. VASCAN does not intend to replace regional or provincial lists but to act as a complement to them. Th e covered regions are, in alphabetical order: Alberta, British Columbia, Greenland, Labrador, Manitoba, New Brunswick, Newfoundland, Northwest Territories, Nova Scotia, Nunavut, Ontario, Prince Edward Island, Quebec, Saint Pierre and Miquelon, Saskatchewan, and Yukon. Th e distribution status of the plant is indicated per region. Th ese can be grouped as present (native, introduced or ephemeral), previously reported but currently considered absent (excluded, extirpated), doubtful or not reported (absent). Th e latter status is not recorded in the database (null value). Excluded taxa are those considered not currently occurring in a region, due either to non-recurring ephemeralness, misidentifi cation, lack of supporting documentation, or when specimens are old and the taxon has not been observed again in more than 50 years. All distribution statuses are defi ned at http://data.canadensys.net/vascan/about/#distribution. Th e VASCAN website (http://data.canadensys.net/vascan) provides a distribution map for each taxon. For higher taxa, these are calculated based on lower taxa, with the distribution statuses ordered as such: native, introduced, ephemeral, excluded, extirpated, doubtful, absent. E.g., if two species within the same genus are respectively native and doubtful in a certain region, the genus is considered native for that region.
Th e website also provides a checklist builder (http://data.canadensys.net/vascan/ checklist), where users can generate their own list of taxa based on several criteria (taxonomy, region, distribution status, or a combination of these) and download this as a Darwin Core Archive or text fi le.

Temporal coverage
17th to 21st century.

Study extent description
See the section 'Spatial coverage' and 'Project details -Study area description'.

Sampling description
Th e data are sampled manually from literature by the editors, though recent additions are based on specimens maintained at institutional herbaria across Canada (see Th iers).
All fl oras covering Canada, Greenland, and Saint Pierre and Miquelon were considered for literature-based data entry, but only the most recent provincial and territorial fl oras (see the section 'References -References used to build the dataset') were systematically searched to establish the distribution status of each taxon in each region (see the section 'Spatial coverage'). Scoggan's Flora of Canada (1978-1979 was systematically searched, as were Kartesz (1999) and the Flora of North America (FNA Ed. Comm. 1993+). English and French vernacular names are based on usage in Canada and, for introduced taxa, on vernaculars from the countries of origin (when the taxon is from Europe). Alternate (synonym) vernaculars are provided when several names are in usage (notably regional names), but an accepted vernacular is recommended for general usage throughout the country. Th e method of selection of vernacular names follows Darbyshire et al. (2000). Th e source of the information is referenced for all scientifi c names, vernacular names and distributions in the dataset.

Quality control description
New fi ndings or corrections for plant distributions are communicated to the editors by contributors from each region (Appendix). Contributors are local botanists, often associated with Canadian herbaria or Conservation Data Centers. All new reports must be documented by specimens deposited at institutional herbaria.
Suggestions or corrections regarding names, taxonomy, or functionality of the VAS-CAN website are submitted by users and reviewers through a public Google Code issue tracker at http://code.google.com/p/canadensys/issues/list?can=2&q=label:vascan. Name suggestions are validated by the editors against names in Tropicos (http://www. tropicos.org), IPNI (http://www.ipni.org), GRIN (http://www.ars-grin.gov), or other plant name databases, before being manually corrected in the database.

Dataset
Th e data are stored in a relational database (MySQL), which powers the search, checklist builder, taxon and name pages of the VASCAN website. Editors update a development copy of the database through a secure web application. Th is allows them to make revisions without aff ecting the users of the website. Once they agree that the data are consistent, in which they are aided by the application, they can push that version of the database to production.
At that moment, the application will also automatically generate a Darwin Core Archive of the data, using the GBIF GNA Profi le (GBIF 2010) and following best practices for publishing checklists (GBIF 2011). Th is archive (Figure 3) includes all data, except for calculated distributions, hybrid parents, and user credentials. Th e archive is then manually uploaded to the Canadensys Repository (http://data.canadensys.net/ipt), a GBIF Integrated Publishing Toolkit, and republished, at which time it will be assigned a new version number (version 24 at the time of publication). Th e dataset is registered with the Global Biodiversity Information Facility (GBIF), which allows that organization to harvest, display and distribute the data at any time.
To the extent possible under law, the Université de Montréal Biodiversity Centre has waived all copyright and related or neighboring rights to this dataset, releasing it to the public domain under a CC0 waiver. Users of the data are encouraged to follow the Canadensys norms for data use and publication (http://www.canadensys.net/norms): Give credit where credit is due: As is common practice in scientifi c research, cite the data you are using.
Be responsible: Use the data responsibly. Th e data are published to allow anyone to better study and understand the world around us, so please do not use the data in any way that is unlawful, harmful or misleading. Understand that the data are subject to change, errors and sampling bias. Protect the reputation of the data publisher and clearly indicate any changes you may have made to the data.
Share knowledge: Let us know if you have used the data. It helps us to showcase our eff orts and it helps you reach a wider audience. Inform us if you have comments about the data, notice errors, or want more information.
Respect the data license: Understand and respect the data waiver under which the data are published. To help you make greater use of the data, we have dedicated the data to the public domain (CC0). Do not remove the public domain mark or provide misleading information about the copyright status. Figure 3. Th e VASCAN Darwin Core Archive, structured following the GBIF GNA Profi le. It is a compressed folder containing 4 text fi les with tab-seperated values and 2 xml fi les. Taxon and scientifi c name information is provided in taxon.txt , with one record for each taxon and child-parent-relationships representing the classifi cation. Records in the extension fi les distribution.txt , vernacularname.txt and description.txt have a many-to-one relation with the records in taxon.txt and provide additional information for each taxon. Th e archive structure and term defi nitions are described in meta.xml . Th e dataset metadata are provided in eml.xml .