Insights from natural history collections: analysing the New Zealand macroalgal flora using herbarium data

Abstract Herbaria and natural history collections (NHC) are critical to the practice of taxonomy and have potential to serve as sources of data for biodiversity and conservation. They are the repositories of vital reference specimens, enabling species to be studied and their distribution in space and time to be documented and analysed, as well as enabling the development of hypotheses about species relationships. The herbarium of the Museum of New Zealand Te Papa Tongarewa (WELT) contains scientifically and historically significant marine macroalgal collections, including type specimens, primarily of New Zealand species, as well as valuable exsiccatae from New Zealand and Australia. The herbarium was initiated in 1865 with the establishment of the Colonial Museum and is the only herbarium in New Zealand where there has been consistent expert taxonomic attention to the macroalgae over the past 50 years. We examined 19,422 records of marine macroalgae from around New Zealand collected over the past 164 years housed in WELT, assessing the records in terms of their spatial and temporal coverage as well as their uniqueness and abundance. The data provided an opportunity to review the state of knowledge of the New Zealand macroalgal flora reflected in the collections at WELT, to examine how knowledge of the macroalgal flora has been built over time in terms of the number of collections and the number of species recognised, and identify where there are gaps in the current collections as far as numbers of specimens per taxon, as well as with respect to geographical and seasonal coverage.


Introduction
Herbaria and natural history collections (NHC) are critical to the practice of taxonomy. Th ey are the repositories of vital reference specimens, enabling species to be studied and their distribution in space and time to be documented and analysed, as well as enabling the development of hypotheses about species relationships. Krishtalka and Humphrey (2000) describe natural history museums as "sentinel observatories of life on Earth" and also "stewards of its future". Repeatable and testable biological sciences are reliant on taxonomy and vouchered specimens. Within the past decade or so, there has been an increasing recognition of the value of collections in the analysis of biodiversity, with interest in their potential applications for example in conservation and ecology, inferring threats associated with anthropogenic change (e.g. McCarthy 1998, Shaff er et al. 1998, Ponder et al. 2001, Graham et al. 2004, Frey 2009, Newbold 2010, Pyke and Ehrlich 2010, Johnson et al. 2011, Tomizuka et al. 2012, Ward 2012. A number of studies have explored the ways in which NHC may be used to evaluate responses of biota to climate change, including examination of apparent shifts in species ranges, detecting the presence of possible introduced species, and prediction of the future changes in species distributions and patterns of species richness under future climate scenarios (e.g. Graham et al. 2004, Johnson et al. 2011. Th ere are many challenges when using NHC for analyses of biota, particularly the potential sources of errors, accuracy and biases (Graham et al. 2004, Boakes et al. 2010, Newbold 2010, Pyke and Ehrlich 2010, Huisman and Millar 2013. On the one hand, NHC provide an "unambiguous record of a taxon at a particular place and time" with the advantage that vouchered material enables identifi cations to be verifi ed and additionally taxonomic concepts can be updated (Johnson et al. 2011). However there are signifi cant issues about the quality of the identifi cations, that is, whether these have been provided by a subject specialist, and also whether taxonomic concepts, changes in nomenclature, and synonymies are being updated.
In terms of spatial and temporal data, older collections tend to be geo-referenced post-collection which may introduce location errors. Th e data associated with specimens are often highly variable as far as the level of detail provided, for example, the precision of the locality of the collection, habitat information, associated species, collection method, and whether multiple collections were made from within diff erent habitats within a site. Some early collections (19 th century, early 20 th century) have only the year or month of collection provided with the specimen. Biases aff ect diff erent aspects of the collections. Spatial biases can result from the position of access roads and settlements, particularly in the case of coastal collecting. Access to collection methods and equipment is also critical, for example, in the case of marine macroalgae in New Zealand there are large sections of coastline where access is only possible from the sea, and thus the use of boats is critical. Weather can have a significant impact on accessibility of sites and can lead to seasonal biases. For an important part of the fl ora sampling is only possible via SCUBA and via dredge equipment for deep-water samples. Th e location of active collectors has an impact on the number of collections obtained from particular regions, as does the perception of areas being of specifi c interest. Temporal biases can result from experts working actively on particular taxonomic groups, as well as from particular curatorial practices and personal interests (e.g. discarding damaged individuals, only accessioning a certain number of individuals) (Ward 2012).
Collections result from targeted investigations as well as from opportunistic sampling. Sometimes remote areas are infrequently visited but have detailed and thorough collections because major eff ort is required to reach the area and very deliberate collections are undertaken. Th ere are inevitably biases as far as which species are collected, with the potential for larger or more conspicuous species to be over-represented and with smaller or more diffi cult to collect species under-represented. In addition, sometimes common species are under-collected whereas rare or unusual species are collected more frequently. Graham et al. (2004) consider that "nonrepresentative sampling in environmental space remains the most diffi cult source of error to detect and correct". Th e material in NHCs only provide presence data, establishing that the species was present at that locality when collected. Interpretation of species absences is complex -the species may not have been at the locality, or was not collected, or not detected.

Macroalgal collections from New Zealand
In the mid to late 19th century there was a period of research on New Zealand algae by overseas research workers, particularly W.H. Harvey (Trinity College, Dublin), J.D. Hooker (Kew, London) and J.G. Agardh (Botanical Museum, Lund). In some cases material was deposited in New Zealand collections, e.g. collections made by William Colenso were sent to Hooker and Harvey and some material retained in New Zealand; collections made by Sven Berggren in 1874 were sent to Agardh with some duplicate material returned to New Zealand (Bagnall 1970). However this was followed by many years when there was very little activity on the description or study of New Zealand algae. Collections made by New Zealanders rather than foreign visitors began with the work of R.M. Laing and W.A. Scarfe, and compilations of species were prepared by Laing (e.g. Laing 1900, 1902, 1909, 1926, 1930.
From the mid-1930s very signifi cant collections were made by Victor Lindauer (Cassie 1971, Cassie Cooper 1995 who corresponded with several international phycologists and also received specimens from Eileen Willa on Stewart Island. From the 1930s onwards Lucy Cranwell and Lucy Moore, at that time students at Auckland University College, made important collections and ecological observations. Th e establishment of algal research within Botany Division of the Department of Scientifi c and Industrial Research (DSIR) saw the development of collections by Moore and Nancy Adams. In the late 1940s both Dr T. Levring and Professor G.F. Papenfuss visited New Zealand. Th e collections made by Papenfuss provided material for many of his graduate students who made major contributions to the understanding of the New Zealand macroalgal fl ora (e.g. Wagner 1954, Norris 1957, Sparling 1957, Hom-mersand 1963. Taxonomic treatments of the fl ora began to be published (e.g. Levring 1955, Chapman 1956, 1969, 1979, Lindauer et al. 1961, Chapman and Dromgoole 1970, Chapman and Parkinson 1974, but to date there has been no comprehensive treatment of the macroalgal fl ora. Th e major macroalgal collections are held in New Zealand in the herbaria of the Museum of New Zealand Te Papa Tongarewa (WELT), Landcare Research Manaaki Whenua (CHR) and at the Auckland Museum (AK) (Th iers 2012). However, WELT is the only herbarium in New Zealand where there has been consistent expert taxonomic attention to the macroalgae over the past 50 years (Nancy Adams 1969-1987, Wendy Nelson 1987-2002, Jenn Dalen 2002. Th e herbarium was initiated in 1865 with the establishment of the Colonial Museum , subsequently known as the Dominion Museum , National Museum of New Zealand (1973)(1974)(1975)(1976)(1977)(1978)(1979)(1980)(1981)(1982)(1983)(1984)(1985)(1986)(1987)(1988)(1989)(1990)(1991)(1992) and the Museum of New Zealand Te Papa Tongarewa (Te Papa) (from 1992). Algal specimens were received into the collections in the 1860s and 1870s, from the inception of the herbarium, including a collection from the British Museum of more than 200 algal specimens from throughout the British Isles dating from 1806-1860 and algal specimens from the Th ompson/J.G. Baker herbarium . Th e Te Papa herbarium contains scientifi cally and historically signifi cant marine macroalgal collections including, type specimens, primarily of New Zealand species, as well as valuable exsiccatae from New Zealand and Australia ).
In the absence of a complete fl ora, there has been considerable recent eff ort directed to compiling and updating lists of currently accepted names and the taxonomic hierarchy, with published lists produced as part of the Species 2000 project documenting the New Zealand biota , Harper et al. 2012, Nelson 2012, and also updated current lists provided on the Te Papa website (e.g. Dalen and Nelson 2013 a-c). Much of New Zealand macroalgal taxonomic and biogeographic literature is based on the WELT collections including Adams (1994) and a series of regional fl oral lists (Adams 1972, Adams et al. 1974, South and Adams 1976, Nelson and Adams 1984, Adams and Nelson 1985, Hay et al. 1985, Nelson and Adams 1987, Nelson et al. 1991, Nelson et al. 1992, Neale and Nelson 1998, Nelson et al. 2002) based on targeted collections. In addition some specifi c projects were undertaken to improve collections and knowledge of the fl ora (e.g. coralline algae, Harvey et al. 2005, Broom et al. 2008; macroalgae from soft sediment environments, Neill et al. 2012;Ulvaceae, Heesch et al. 2007, 2009. Th e WELT collections have been databased over a period of ca. 15 years. Th e recent focus within the herbarium has been on improving collection data and checking the dataset for errors, particularly grooming collection date data and mapping and verifying locality data. Th e collections at both AK and CHR have not been fully databased to date and have not received the level of scrutiny and identifi cation that has been directed to the WELT collections. Th e AK and CHR collections currently do not have specialist marine phycologists associated with the collections. Because WELT collections have received expert identifi cation and curation, they have been used as the primary source of data on the distributions of marine macroalgae for a number of research projects and government databases (e.g. Booth et al. 2006) and WELT is also where voucher specimens have been deposited (e.g. for the Marine Invasives Taxonomic Service, contracted to NIWA by the Ministry for Primary Industries).

The purpose of this study
Th e grooming and updating of the database has provided an opportunity to review the state of knowledge of the New Zealand macroalgal fl ora refl ected in the collections at WELT, and to ask a series of questions. Which regions of New Zealand are represented by the most comprehensive collections/least comprehensive collections? How has knowledge of the macroalgal fl ora built over time in terms of the number of collections and the number of species recognised? Are there patterns that can be discerned in the collection history and coverage? What proportion of the fl ora is represented by suffi cient individuals for study and comparative investigation (number of specimens, geographic range, seasonal distribution)? Do these collections have the potential to enable other types of biodiversity analyses?

Materials and methods
Th e data presented in this paper are drawn from the database of the Te Papa herbarium. Defi nitions for terms used in this paper are provided in Table 1. Data were assembled following several steps:

History of data and specimen information verification
Until the early-1990s, herbarium specimen data were available from the specimen labels and a hard copy register. Th e fi rst electronic database system at Te Papa, Te Kahui, was custom-designed and implemented in 1993. Data were retrieved from the specimen labels and entered into the system by trained data-entry technicians. Where appropriate, extra information was sourced -the majority of this being latitude and longitude information derived from maps (NZMS 260 series). Most of the existing New Zealand algal specimens in the collection had an electronic record completed by ca. 2001. However, the record error rate was relatively high -in the order of 30-40%, with respect to coordinate data and locality information.
In 2005, all of the museum's electronic records were migrated to an electronic collection information management system, KE EMu® (referred to as EMu), a relational database customised for museum collections. With the implementation of the new database, there was scope to improve the quality of information recorded. Features of EMu, such as sophisticated search functions and global updating options, have facilitated cross-checking for consistency in the locality records, use of place names and collector details. Th e verifi cation and grooming of the algal data (e.g. cross checking of longitude and latitude, consistency of locality records, use of place names, collector details, identifying missing data fi elds and locating information where available) has become a core collection management activity since this time. However, much of this grooming eff ort has been somewhat opportunistic (for example, new acquisitions prompt a cross-checking of data for similar/nearby locations). Several special projects and requests for data have prompted more comprehensive data verifi cation eff orts. As part of this, several thousand backlog algal specimens were identifi ed and databased; the database component of the work further prompted refi nement to the consistency and accuracy of the locality data. Attention to the application of taxonomic concepts and names across the collection was also undertaken as part of this work.

Taxonomic framework
In the absence of a published fl ora, a current species names list and taxonomic hierarchy is maintained on the Te Papa website (http://www.tepapa.govt.nz/ - Dalen and Nelson 2013a-c). Changes to current taxonomic names and classifi cation have been drawn from primary literature and updated into Te Papa's database and the application of name changes to the collections has also been part of this eff ort.

Regional categorisation
Figure 1 illustrates the regional boundaries applied in this exercise. Th e boundaries refl ect a combination of biogeographic boundaries in previously published accounts of the marine biota (e.g. Adams 1994, Nelson 1994, Shears et al. 2008, as well as province defi nitions employed by Te Papa for the plant collections. Table 1. Defi nitions of terms used in this paper.

Term
Defi nition Collecting event For each region collections were sorted by year, then collection date and precise location. Each unique combination of date and precise location was treated as a collection event.

Duplicate records
Specimen duplicates, i.e. same taxon with identical collection data. (Only 1 example of duplicate sheets (e.g. labelled a-c) was retained; duplicates with diff erent registration numbers were removed.)

Season
Collection dates were grouped by month and allocated to seasons as follows: December-February = summer; March-May = autumn; June-August = winter; September-November = spring Record Single packet, box or specimen sheet Taxon Name used in database which includes identifi cations to family , genus and species level as well as tag names (informal names assigned usually in preliminary stages of investigations or for entities recognised in the fi eld)

Download of data from Te Papa database
All New Zealand marine algal records were searched and grouped by latitude and longitude coordinates corresponding to the regions as defi ned above (20 categories) in Te Papa's database. Records, current to December 2011, were exported by region into Excel spreadsheets. Table 2 summarises the data used in the analyses undertaken. Th e number of taxa includes all recognised entities present in the collections, including some that have been recognised as distinct at a family, genus or species level but are currently unnamed. It is important to note that the publicly accessible fl ora lists

Results and interpretation
Th e number of new taxa represented in the collection by year of collection is presented in Figure 2, the cumulative total of taxa in the collections in Figure 3, and the number of taxa in the collections by decade of collection broken into divisions (red, brown, and green algae) in Figure 4. Th e surge in the number of collections around the 1870s refl ects the material collected by both S. Berggren and H.H. Travers that was sent to Lund for examination by J.G. Agardh with duplicate material returned to New Zealand. Most specimens lodged prior to the 1930s were collected by W.A. Scarfe and R.M. Laing. In 1935 Josephine Tilden from the University of Minnesota, and a group of associates, visited New Zealand collecting in the Bay of Islands as well as on Stewart Island, and material was distributed as "South Pacifi c Plants". V.W. Lindauer, the school teacher at Russell, Bay of Islands, was introduced to seaweeds by Tilden, resulting in his major contributions to New Zealand phycology (Cassie 1971, Cassie Cooper 1995, Nelson and Phillips 1996    in new collections deposited by other marine research workers. During the 1980s and 1990s collections for the series of regional fl ora lists resulted in many new collections as well as new taxa. During the 2000s specifi c projects on elements of the fl ora (e.g. Bangiales (Nelson et al. 2001(Nelson et al. , 2003, Ulvaceae (Heesch et al. 2007(Heesch et al. , 2009), non-geniculate coralline algae (Harvey et al. 2005), macroalgae associated with soft sediments (Neill et al. 2012)) contributed to peaks in particular groups of algae. Figures 5a-f present the cumulative number of species recorded from selected regions (Kermadec Islands, NI North (Northern North Island), Wairarapa-Cook, Chatham Islands, Bounty Islands, Campbell Island), revealing the patterns of collecting history in greater detail. Analysis revealed that many entities in the fl ora are known from very few records (Table 3). Of the 996 taxa in this analysis there are only 210 taxa in the collection for which there are more than 30 records (20 greens, 51 browns, 139 reds). Over the whole collection 17% of the taxon records are known from a single record and 44% from fi ve or fewer records.
Th e data available for each of the 20 regions within New Zealand are summarised in Table 4. Th is lists the number of taxa, the composition of the fl ora in each region,  the number of records that the data are based on, the composition of the fl ora by region in terms of the number of classes, orders and families represented, the number of years in which collections were made (of a potential 164 years between 1848-2011), and the seasons in which collecting events occurred. In some regions the collections have been made over an extended period, and span all seasons, whereas it is clear that some other regions are infrequently visited and no collections made in some seasons. Overall the impact of season is relatively modest with the number of collections from the winter months being only two thirds of the total from the summer months. Figure  6 summarises the number of taxa unique to each region by class.

Discussion
Th ese analyses have provided an opportunity to review the state of knowledge of the New Zealand macroalgal fl ora and to investigate how well the current collections at Te Papa represent the macroalgal fl ora of New Zealand. Th e cumulative total of taxa in WELT shows that the fl ora is not reaching asymptote, suggesting that more discoveries are likely with further investigations of the fl ora. It is clear that some regions of New Zealand have received greater attention (number of records, collecting events) (Table 4), and as a consequence the fl ora is better understood in these areas (in particular Wairarapa-Cook, NI North). Th e number of collections available by region is in large part a consequence of the presence of active research workers, research institutions or programmes, as well as the accessibility of the coastline. Th e northern and southern island groups -Kermadec and Th ree Kings Islands, and Snares, Bounty, Antipodes, Auckland, Campbell Islands -are all diffi cult to reach, subject to inclement weather and are relatively infrequently sampled, i.e. collections made in fewer than 20 years in the 164 years since the fi rst New Zealand collections were lodged in WELT (Table 4). Th e knowledge of the fl ora in diff erent regions has been built up in quite diff erent ways. Macroalgae were fi rst collected from the Kermadec Islands by New Zealand based scientists in 1908, although material had been collected in 1854 and 1874 by expeditions and lodged in European herbaria (Nelson and Adams 1984). It was not until collections were made by a trained phycologist in 1982 that the fl ora was more thoroughly understood (Fig. 5a). Th ere have been no subsequent targeted collections from the Kermadec Islands with only occasional opportunistic collections deposited in WELT (Fig. 5a). Th is collection history, coupled with the fact that 82% of the fl ora is currently known from 5 or fewer records (Table 4), strongly suggests that the macroalgae of this region are under-represented in the collection. Th e fl ora of the Kermadec Islands diff ers markedly from the rest of New Zealand, with strong affi nities to the warm-water regions of the Pacifi c and Indian Oceans (Nelson and Dalen in press). Of the 152 taxa recorded in our data from the Kermadec Islands, almost half of these are represented in the New Zealand region only in these northern islands ( Figure 6).
Th e collection history of the NI North (Fig. 5b) and Wairarapa-Cook (Fig. 5c) are interesting to compare. Th ese are the most diverse and intensely sampled regions. Th e Bay of Islands is a key collecting area in the NI North and has seen bursts of collecting activities by Tilden and Lindauer in particular. Th e collecting history of the Wairarapa-Cook region refl ects the presence of phycologists and collectors, with a steady growth in the knowledge and representation of the fl ora over an extended period. Th e fi rst collections of macroalgae from the Chatham Islands that formed the basis of a published account were made in 1863-64 and then in 1871, with further collections made by a German research expedition in 1897. However it was not until the mid-1980s that thorough and detailed collections were made of the fl ora (Fig. 5d). In the intervening 80-90 years there were only scattered and infrequent collections made on the islands (summarised in Nelson et al. 1991). Based on the number of collections the Chatham Islands fl ora is now relatively well represented at Te Papa, although 18% of the fl ora is still known from only a single record. Th e collecting history of the southern Bounty Islands (Fig. 5e) (which are only 1.3 km 2 in area and 700 km distant from the nearest landmass) and Campbell Island (Fig. 5f) (113 km 2 in area, also 700 km distant from the nearest landmass) refl ect problems of access. Th is is also shown in the seasonal breakdown of collecting years (Table 4).
In terms of regional diversity the northern North Island has the most taxa recorded, contains the highest proportion of the total fl ora, and has the greatest phylogenetic diversity (as represented by the highest numbers of classes, orders and families present). Although there are almost 3000 specimen records from the region, 25% of the taxa from the area are known from a single record, and 60% from 5 or fewer records. Th e off shore Th ree Kings Islands and the southern islands (Snares, Bounty, Antipodes, Auckland and Campbell Islands) are represented by the smallest number of collections resulting from few collecting events. Th e mainland areas that have received the least collecting eff ort (fewest collection events) are North Island East, South Island Northwest, and the South Island Southern.
Th e proportion of the fl ora that is represented by a very small number of records is salutary, with ca. 44% of the fl ora known from 5 or fewer records (Table 3). When the data are examined by region (Table 4) the diff erences in the coverage of collections can be assessed. For phenological studies and comparative investigations it is important to have a number of specimens to evaluate variation and attributes that may be infl uenced by maturity, seasonality, and/or reproductive status. Th ere are 210 taxa in the collections, identifi ed to species, for which there are more than 30 records: in our view, this number of records provides suffi cient individuals for such comparative studies. In terms of the application of NHC collections for understanding the responses of the fl ora to human-induced environmental changes, Johnson et al. (2011) consider that collections with "large numbers of common taxa are the most useful as time series for determining species level responses" although they note that such collections "typically have been perceived as of low priority for acquisition or curatorial eff ort".
Our analyses have enabled us to test the quality of the data associated with the specimens. Th e data grooming exercises prior to these analyses have minimised location errors (e.g. latitude and longitude, place names) but opportunity for minor transcription errors still remains. Th e main issues aff ecting data quality are the level and standard of identifi cation, which are directly infl uenced by the current state of the systematic knowledge of the fl ora. Th ere have been few monographic studies of macroalgal taxa in New Zealand, but recent research across a range of orders has revealed new taxa, and the need for signifi cant taxonomic revisions. Th ese studies have also concluded that understanding the diversity in the fl ora is still in a discovery phase (e.g. Broom et al. 2004, Heesch et al. 2009, D'Archino et al. 2011. Although Te Papa's collection data are the best available at present within New Zealand, our analyses have revealed that the macroalgal fl ora is currently poorly represented in terms of numbers of records for many taxa, as well as in the geographic and seasonal spread of specimens. Although it is questionable whether the Te Papa collections constitute a comprehensive or suffi cient baseline with which to evaluate change in the environment or in the fl ora composition, these herbarium specimens are a very signifi cant source of data both for current biodiversity assessment and planning and also for future applications in biodiversity analysis, conservation and ecology. Th ere are areas within the collection that have been developed from specifi c research programmes (e.g. collections for the regional fl oral list series, Bangiales, coralline algae, macroalgae from soft sediment environments, Ulvaceae) where the specimens have been collected in a systematic and targeted way, and in some cases can be associated with other key environmental data. Th ese collections provide a reference baseline if there are opportunities for the regions or habitats to be resampled in the future. Th e analyses performed here are repeatable if the collection continues to receive the same attention, i.e., expert identifi cation and application of current names, precision of data entry with respect to consistency of place names, and coordinate data Th is analysis has identifi ed gaps in the macroalgal collections, both taxonomically and geographically, and also data that can inform future collection development. A number of recent papers reviewing the role of NHCs have stressed the function of museums and herbaria as "part of the essential infrastructure of science" (Johnson et al. 2011) and their value to conservation biologists and ecologists for studying species' distributions and abundance (Newbold 2010). Institutions faced with the expense of care and maintenance of NHCs need to have strategically focused research and collection development policies which identify the opportunities for their collections to serve not only research on biosystematics, distribution and evolution of biotas, but also to have wider applications for environmental and conservation science. Whilst opportunistic collections can be valuable in providing material to complement existing material, and have often resulted from collectors seeking to maximise fi eld opportunities and access to infrequently visited areas, there is a need to move to a more systematic approach to the sampling of diversity to provide higher quality data. Ward (2012) recommends that NHCs "must become drivers of biodiversity science" and suggested four key priorities for NHCs -mass databasing, analysis of holdings, identifi cation of ecological datasets, and repositories of ecological projects. We have addressed the fi rst two of these priority areas and have also identifi ed datasets within the Te Papa macroalgal herbarium that have potential to serve as baselines for future research. At present the herbarium is not equipped to serve as a repository for ecological projects. Th is aspect of future-focused work is challenging and considerable care will be needed to develop data protocols to record information about sampling eff ort, population size and other ecological attributes.