Data and Knowledge Preservation

EarthzineImpacts of Invasions 2017, Original, Themed Articles

Invasive species data and knowledge preservation is vital for future research, and collection methods impact data quality.

Collecting data on paper with a separate GPS unit and camera. Image Credit: Karan Rawlins

Data and knowledge preservation is a constant struggle against the advance of time and the focus on the present and future. Historically, knowledge was passed from person-to-person in what is now referred to as oral or traditional knowledge. Even in present day, this is still practiced in many fields as a type of institutional memory or institutional knowledge; facts, experiences, and knowledge are passed from peer-to-peer, supervisor-to-employee, or amassed by the individuals in an organization or field.

In invasive species research and management, this may refer to a technician knowing how to best use quirky equipment, where certain invasive species populations were previously eradicated, or why certain local areas may be more susceptible to invasion. However, this system only works if there is someone to speak it, someone to hear it, and the knowledge doesn’t degrade over time. åÊ

Without publicly available written accounts, so-called ‰ÛÏcommon‰Û knowledge and data may not be available to others or considered scientifically valid, as there is little consensus on what constitutes ‰ÛÏcommon knowledge‰Û and ‰ÛÏcommon knowledge‰Û is not a static set of facts (Shi, 2011). This also can impact future studies and questions if past observations are made, but not written, especially in cases of rare or time-sensitive occurrences.

Without written accounts or published material, it is difficult to assess plant phenology change over time, or ecosystem-wide plant community regeneration after rare natural events (e.g., floods, droughts, tsunamis), or the timing of insect or disease emergence in response to varying conditions (Gibney and Van Noorden, 2013). Published accounts or available data can lend validation to scientific theories, provide data for experiments, and even inspire new experimental questions (Layne et. al, 2012). åÊIt also allows for future researchers to test ideas that were based on institutional knowledge.

Methods differ for recording data in invasive species management, more so now than in previous decades. Broadly, data can be collected on paper or electronically. Previous research in comparing paper and electronic data collection has shown that electronic methods are quicker, more accurate, and less expensive.

C. Leisher (2014), Thriemer et. al (2012), and Olson et. al (2014) all found that recording data electronically reduced the average time by 45 to 58 percent compared to paper recording, some of which can be attributed to using built-in smartphone/tablet or programmatic features. Data turn-around time is less than 24 hours electronically, compared to days or months for paper (Olson et.al, 2014; Thriemer et. al 2012). Thriemer et. al (2012) and Olson et. al (2014) found that recording data electronically reduced errors from 7-10 percent for paper to 0-1 percent. åÊ

Comparing the GPS accuracy of a Droid X, iPhone 4, and Garmin eTrex, the Garmin only reduces the location error by less than 1 percent (Olson et. al 2014). C. Leisher (2014) found that surveys conducted with tablets cost 74 percent less per interview than paper, with the majority of the savings found in reduced data cleaning and enumerator fees.

Some barriers to implementing electronic data collection may include: lack of technological support, ill-designed recording software, lack of user motivation or participation, or prohibitive direct costs (Welker, 2007).

Collecting data via a smartphone. Image Credit: Rebekah D. Wallace

Comparing paper data collection with the multiple methods of electronic data collection, proprietary software, open software, or in an aggregate database, all methods have benefits and drawbacks (Table 1). Many people still prefer to collect data via paper and pen, as it easily allows for additional notes and observations and the process is only as complicated as the data that needs to be recorded. However, the information is stuck on paper until someone transcribes into a computer program; the method has no strict standardization, is error prone, and is costly due to double-data entry and error checking (C. Leisher, 2014; Thriemer et. al, 2012; Olson et. al, 2014). åÊIt may also require separate electronic equipment, such as a GPS device to record coordinates for invasive species mapping.

Some programs will use proprietary software or programs to record and store data. åÊThe forms and structure are usually adaptable or designed for data recording in a broad scientific field, require some data standardization, and will often have a section to record metadata. There may be no need for transcription if the software can be loaded into a tablet or smartphone or if researchers can import from another, more mobile program, which reduces errors and cost. However, proprietary software often has a large one-time purchasing or annual subscription fee, and the user is reliant upon the developers continuing to provide support. åÊ

As of March 2017, an ArcGIS Online subscription costs $450-$500 annually, based on the number of users per subscription. Proprietary software also may have limitations in exporting or converting data to other formats, which makes it difficult to use the data in another program (e.g., for analysis, mapping, modeling) and for sharing with others. åÊ

Open software, in this example, covers programs that people can use for free or has been bought for another purpose and utilized to record data. This includes Google Sheets, Forms and Documents, which are free for personal use, and Microsoft Office Excel and Word, which is $149.99 as of March 2017 for one computer, as well as the myriad of free or low cost online form builders and data collection applications. This method often has many of the same benefits as proprietary software, but with the addition of being low cost-to-free, often highly customizable, sharable, and convertible to other file formats, though there can be a bit of a learning curve to creating more involved forms (Welker, 2007). åÊ

Last, aggregate databases offer many of the same benefits of open software, but they also immediately protect the data recorders from data loss or inaccessibility and are available to be shared immediately. Aggregating data on similar subjects creates an overall more robust data set for subsequent users. Examples include the Biota of North America Program, U.S. Department of Agriculture Plants Database, Global Biodiversity Information Facility, and EDDMapS. There is a concern among scientists that such ready sharing of data doesn’t address certain privacy concerns and that the data could be scooped and published before they are able to do so (Harding et. al, 2012; Soranno et. al 2015). åÊ

While there may be certain barriers and initial training needed, electronic data collection has proven to be quicker in input, quicker in results availability, less expensive, and less error prone (Welker, 2007).

Of ongoing concern is the fact that many programs do not take into account the long-term preservation of the data generated at a project’s inception, and so don’t have a plan for how to house or share the data when the project is over. åÊ

Table 1: Data collection tools analyzed

Knowledge Case: Garlic Mustard in Georgia

In 2005, Chris Evans found a population of garlic mustard (Alliaria petiolata) on Kennesaw mountain in Georgia (Markiewicz, 2005). He researched online, with local experts, and with the Kennesaw Mountain National Battlefield Park staff to find if this population had been previously reported, and found no records or institutional knowledge (Evans, 2017). The Atlanta Journal-Constitution wrote an article about the find and the planned weed-pull day; Chris Evans stated that the anticipated infestation area was only a 0.5 acre (Markiewicz, 2005). A subsequent article noted 50 volunteers attended the weed-pull, and the infestation was found to be more extensive, covering about 17 acres (EDDMapS, 2010; Starks, 2005). åÊ

After the initial article was published, a local self-taught botanist, Scott Ranger, contacted Evans informing him that he had found that population on May 10, 1989 (EDDMapS, 2010; Ranger 2009). Ranger had not submitted a sample to an herbarium, but had told an invasive species expert in Ohio. Ranger also had told the park superintendent of the infestation, but it was not managed and by the time Evans ‰ÛÏdiscovered‰Û the patch no one who knew remained at the park. åÊåÊ

In 2009, this story was going to be used as an example of early detection and rapid response and the importance of publicizing new invasive species populations, so Chuck Bargeron and Evans reached out to Ranger. In Ranger’s 2009 email, he wrote that he found a very small population in 1989, then ‰ÛÏChris Hughes‰Û rediscovered it about ‰ÛÏ5-6 years ago‰Û and arranged some AmeriCorps volunteers for vegetative control (Ranger, 2009).

From 2009-2016 the story presented at meetings across North America was that Ranger discovered the population in 1989, ‰ÛÏChris Hughes‰Û found it in 2003-2004, and then Evans rediscovered it in 2005. However, recent investigation into ‰ÛÏChris Hughes‰Û revealed that he doesn’t actually exist. Reviewing the 2005 Atlanta Journal-Constitution article, it was found that the journalist misattributed the garlic mustard find and quote to ‰ÛÏChris Hughes‰Û rather than to Chris Evans (Markiewicz, 2005). Ranger’s email likely referenced the name and control efforts from that article, misremembering the date. Given that information, and that no park staff knew about the garlic mustard only 1-2 years after any control efforts by ‰ÛÏChris Hughes,‰Û it was determined that his existence was based solely on incorrect knowledge being published and verbally perpetuated. However, having a public record and preserved emails to investigate allowed for this knowledge to be corrected, a feat that would be much more difficult without documentation.

Ultimately, without the publicity that Evans was able to bring to the invasion, Ranger’s ‰ÛÏoriginal‰Û report may not have been found. Though, if Ranger had a route to a more receptive audience there may have been a chance at eradication of that population. åÊ

Data Case: What’s Invasive

‰ÛÏWhat’s Invasive‰Û was an invasive species reporting program developed in 2008 by the University of California-Los Angeles along with other members of the Center for Embedded Networked Sensing (University of California-Los Angeles, 2010). The goals of the program were to provide scientific data on invasive species occurrences and promote invasive species awareness to the general public. This project was funded by a five-year grant and developed a website and the first successful smartphone application for reporting in-field observations. The program garnered 7,863 reports over the five-year term of the project (University of California-Los Angeles, 2010).

However, the project didn’t have a long-term plan for data preservation at its inception. Prior to the project’s end, the program worked with Bargeron at the UGA-Center for Invasive Species and Ecosystem Health (Bugwood) to convert and house the What’s Invasive data into the EDDMapS aggregate database (University of California-Los Angeles, 2010). åÊAt the project’s culmination in 2012, the data, as well as support of the ‰ÛÏWhat’s Invasive‰Û website and smartphone application, were transferred to Bugwood to ensure that the data would continue to be available to the public.

Les Mehrhoff conducting IPANE training on invasive species identification and management. Image Credit: Steve Manning, Invasive Plant Control Inc.

Data Case: Invasive Plant Atlas of New England

The Invasive Plant Atlas of New England (IPANE) was initiated in 2001 as a collaboration between the University of Connecticut, New England Wildflower Society, The Center for International Earth Science Information Network (CIESIN) and the Silvio O. Conte National Fish and Wildlife Refuge (Invasive Plant Atlas of New England Training Manual, 2011). The project area covered Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island, and Vermont and was grant funded by the U.S. Department of Agriculture’s National Research Initiative in 2001, 2005, and 2008. åÊ

The initial goals of the program were to evaluate the status and spread of invasive alien species, increase public awareness, establish data collection and dissemination methods, and develop early detection capabilities (Bois et. al, 2011). To this end, a training program and materials were developed to aid citizen scientists in the identification and reporting of invasive plants. åÊ

To ensure the data was sound, any new or difficult to identify species were independently verified by experts. In 2009, planning began to move the website and reporting portion of the program to Bugwood to preserve the occurrence data and technology created. åÊIn 2010, Dr. John Silander assumed primary directorship over the program and continued the effort to migrate the IPANE website, smartphone application, and reported data to Bugwood and the EDDMapS database (Invasive Plants Council, 2011). In 2012, the migration was completed, which allowed for IPANE data to be aggregated with other invasive species data across North America. There wasn’t dedicated funding to continue technical support and updates to the IPANE smartphone applications, so in 2014 the primary reporting application for the Northeast US became ‰ÛÏOutsmart Invasives.‰Û As all the data from these applications is housed in the EDDMapS database, no records have been lost.

Conclusions

Institutional knowledge preservation is an important factor often overlooked in many programs. åÊThere are many aspects of a program’s operation that aren’t captured as cleanly as numbers and other data types, but metadata and other procedures can be important in interpretation of data. Data preservation must be considered whenever a project or program is initiated, ending, or if a plan doesn’t already exist.

If possible, data must be collected in such a way that it conforms to existing standards. The data collection and mapping standards were developed, and recently updated, by the North American Invasive Species Management Association to encourage the collection and standardization of certain mandatory and optional fields with regards to invasive species (North American Invasive Species Management Association, 2014). The best option is to choose a data recording method that encourages standardization, has few errors, is sharable, and is low-cost, and a data storage procedure that ensures data is not lost, inaccessible, or destroyed. åÊData stored on paper or one computer or hard drive is highly susceptible to becoming destroyed, corrupted, or inaccessible over time. By using an aggregate database as a primary venue to house data, it is more likely to survive beyond any individual project or program and it is available to others to use in their own research.

Rebekah D. Wallace is the EDDMapS data coordinator at the University of Georgia – Center for Invasive Species and Ecosystem Health.

Charles T. Bargeron is the associate director ‰ÛÒ Invasive Species and Information Technology at the Center for Invasive Species & Ecosystem Health and has a Public Service Faculty appointment split between the Warnell School of Forestry and Natural Resources and the Department of Entomology at the University of Georgia.

Joseph H. LaForest is associate director for the University of Georgia’s Center for Invasive Species and Ecosystem Health (Bugwood) and leads the Integrated Pest Management (IPM) and Forest Health programs.

References

[1] S. T. Bois, (2011, October 15) ‰ÛÏKeeping track of invasive alien species with IPANE‰Û [Online] Available http://www.ecolandscaping.org/10/invasive-plants/keeping-track-of-invasive-alien-species-with/ipane/

[2] S. T. Bois et. al, ‰ÛÏInvasive plant atlas of New England: the role of citizens in the science of invasive alien species detection‰Û BioScience, vol. 61, no. 10, pp. 763-770, Oct. 2011. doi: 10.1525/bio.2011.61.10.6.

[3] EDDMapS About – History, (2010) [Online] Available http://www.eddmaps.org/about/history.cfm

[4] C. Evans, private communication, Mar, 2017.

[5] E. Gibney and R. Van Noorden, ‰ÛÏScientists losing data at a rapid rate‰Û Nature News åÊ& Comment, (2013, Dec. 19) [Online] Available http://www.nature.com/news/scientists-losing-data-at-a-rapid-rate-1.14416 doi: 10.1038/nature.2013.14416

[6] A. Harding, et. al, ‰ÛÏConducting research with tribal communities: sovereignty, ethics, and data-sharing issues‰Û Environ. Health Persp., vol. 120, no. 1, pp. 6-10, Jan. 2012.

[7] Invasive Plant Atlas of New England Training Manual, (2011) [Online] Available http://bugwoodcloud.org/eddmaps/ipane/volunteers/training_materials/ipanetraining_manual.pdf

[8] Invasive Plants Council, (2011, Dec 16) ‰ÛÏInvasive Plants Council Twelfth Annual Report‰Û pp. 75-80. [Online] Available http://cipwg.uconn.edu/wp-content/uploads/sites/244/2013/12/IPC2011AnnualReport.pdf

[9] R. Layne, et. al, ‰ÛÏLong term preservation of scientific data: lessons from jet and other domains‰Û Fusion Eng and Design, vol. 87, pp. 2209-2212, 2012.

[10] C. Leisher, ‰ÛÏA comparison of tablet-based and paper-based survey data collection in conservation projects‰Û Soc. Sci., vol. 3, pp. 264-271, 2014. doi: 10.3390/socsci3020264.

[11] D. A. Markiewicz, ‰ÛÏGarlicky weed storms Kennesaw Mountain‰Û Atlanta Journal-Constitution (28 May 2005)

[12] North American Invasive Species Management Association, (2014) North American Invasive Plant Mapping Standards. [Online] Available http://www.naisma.org/standards

[13] D. D. Olson et. al, ‰ÛÏMonitoring wildlife-vehicle collisions in the information age: how smartphones can improve data collection‰Û PLOS ONE, vol. 9, Jun 2014. doi:10.1371/journal.pone.0098613.

[14] S. Ranger, (2009, Aug. 16) Garlic Mustard at Kennesaw Mountain [E-Mail]

[15] L. Shi, ‰ÛÏCommon knowledge, learning and citation practices in university writing‰Û Nat. Council of Teachers in English, vol. 45, no. 3 pp. 308-334, Feb. 2011.

[16] P. Soranno, et. al, ‰ÛÏIt’s good to share: why environmental scientists’ ethics are out of date‰Û BioScience, vol. 65, no. 1, pp 69-73, Jan. 2015.

[17] K. Starks, ‰ÛÏWeed war park takes quick action on aggressive, fast-spreading vine‰Û Marietta Daily Journal (4 June 2005).

[18] K. Thriemer et. al, ‰ÛÏReplacing paper data entry collection forms with electronic data entry in the field: findings from a study of community-acquired bloodstream infections in Pemba, Zanzibar‰Û BMC Res. Notes, vol. 5, pp. 113-119, 2012. doi: 10.1186/1756-0500-5-113.

[19] University of California – Los Angeles, (2010) Center for Embedded Networked Sensing: Participatory Sensing (PART) Research Projects – Part 03 – What’s Invasive & Project Budburst. Available http://wayback.archive.org/web/20100708022918/http://research.cens.ucla.edu/urban/2010/part03.pdf

[20] J. A. Welker, ‰ÛÏImplementation of electronic data capture systems: barriers and solutions‰Û Contemp. Clinical Trials, vol. 28, pp. 329-336, 2007. Doi: 10.1016/j.cct.2007.01.001