Ellsworth LeDrew, University of Waterloo, Waterloo, ON Canada
Mark Parsons, National Snow and Ice Data Centre, Boulder, CO, USA
Taco de Bruin, NIOZ Royal Netherlands Institute for Sea Research
We are in the midst of one of the most exciting international and interdisciplinary science projects that many of us will encounter in our professional careers – the International Polar Year. Scientists in the natural, social, and health sciences are collaborating on some 228 endorsed projects in both the Arctic and Antarctic during a two-year period (March 2007-March 2009) of intense field observations. These science projects address crucial issues at a critical time in the evolution of the earth system.
|During the first International Polar Year (1881 – 1884) eleven
nations established 14 principal research stations across the
Polar Regions; twelve were in the Arctic, along with at least
thirteen auxiliary stations. For more information about the
first IPY, visit www.arctic.noaa.gov/aro/ipy-1/. (Image credit:
A common thread in all projects is how we manage the data for collaboration now, during this IPY, and in the future as new science topics and issues emerge. The first IPY was in 1882 and very little data from that ground-breaking program remain. Imagine the possibilities of assessing polar environmental change using comparable data for 1882 and 2007. The second IPY was in 1932-33 and, again, very little data remain, possibly as a consequence of the destruction of records during the Second World War. The third IPY, which became better known as the International Geophysical Year of 1957-58, provided solid data, and some graduate students working during that program are working in the IPY program today as project leaders!
The key word of this experience is legacy. Data management has been foremost in the minds of the architects of the current IPY: “Building an integrated data set from the broad range of IPY research activities represents one of IPY’s most daunting challenges. An enduring data set, accessible to scientists and the public during IPY and for many decades into the future, will represent one of IPY’s strongest legacies” (The Scope of Science for the International Polar Year, 2007-2008). “In fifty years time the data resulting from IPY 2007-2008 may be seen as the most important single outcome of the programme” (A Framework for the International Polar Year).
To provide this legacy we need to ensure that data are fully documented so that future users can assess the nature of the observations and quality of the data to guarantee consistent analysis, and the data themselves must be readily accessible and preserved in accordance with international standards. This is not a simple process. The Data Management and Policy SubCommittee of IPY has been assessing the issues and provides guidance to scientists. A strong and forward-looking International Data Policy has been endorsed by IPY. Several countries have provided resources and support to their scientists so that they may meet the expectations of that policy.
A major hurdle in achieving the policy’s goal of free, open, and timely data access is a lack of understanding by IPY scientists of relevant standards and of how to archive their data properly so that they may contribute to the broad IPY legacy. Our experience through interaction with scientists in workshops, conferences, and personal conversations indicates that many do not understand the nature and issues of ‘metadata’ versus the actual observed data to be analyzed. For scientists to use what other scientists have collected that may be of value to their own project, they must first be able to find the data and then be able to assess the characteristics of the data, including the nature of the instrumentation, accuracy, precision, location, environmental conditions, etc. There are methods for describing these data characteristics in formal descriptions called ‘metadata’ that help researchers ‘discover’ useful information. Hence the term, ‘Discovery portal’, a tool or interface that provides searchable linkages to many, distributed data access nodes. A portal may use descriptions such as those designed by the Federal Geographic Data Committee (FGDC), the Global Change Master Directory (GCMD) or the International Standards Organization (ISO 19115).
Some portals use a defined subset of information that is common to multiple standards to enable improved interaction across multiple nodes of a data system. One example of this approach is a NASA project called Discovery, Access, and Delivery of Data for IPY (DADDI). DADDI provides descriptions of and direct access to data held by a growing number of data centers around the world. The project is also exploring methods for allowing reseachers to ‘visualize’ the data on a computer generated map of the Arctic.
|Sgt. Winfield Jewell taking meteorological observations at
Fort Conger, Grinnell Land, August 1882. Very little data
remain from the first IPY. (Photo credit: NOAA)
Implicit in the concept of metadata is that colleagues should have access to your data. This challenges some engrained traditions in the sciences – hold on to your data until you can publish a paper or your student defends his/her thesis. These traditions of restricting access to data can undermine a fundamental principle of IPY which is to encourage collaboration internationally and across disciplines. Our goal is that this principle of data sharing and fostering collaboration would eventually extend beyond individual IPY project partners to allow other IPY projects to acknowledge and take advantage of your work. This may lead to new developments not foreseen at the proposal stage. As one IPY researcher said, “Every time I share data, I learn something.” This is a fundamental rationale for ‘discovery’ portals that allow searching of metadata.
In practice, we have found little opposition to the immediate sharing of metadata, once scientists learn that this does not include the analysis data but just the descriptions of the data. On the other hand, there should be provision for sharing of the analysis data themselves. For a ‘two’ year International Polar Year, the time is too short for scientists to wait until a thesis is defended or paper is published to make the data available to others. The IPY Data Policy requires that: “IPY data, including operational data delivered in real time, are made available fully, freely, openly and on the shortest feasible timescale.” Exceptions only apply to protect confidentiality of information about human subjects, to respect the needs and rights of holders of local and traditional knowledge, and to ensure that data release does not lead to harm of endangered or protected resources. Yet we hear from some scientists about ‘ownership’ of and ’embargos’ on data. This indicates that we need a shift in the culture of science to achieve the spirit of ‘shortest feasible timescale,’ which means months, not years.
An approach that could facilitate this culture shift may be found by placing a higher value on the publication of data, a value equivalent to that of a refereed publication when considering merit review and promotion of scientists. It is essential that we formally recognize the substantial intellectual effort involved in creating a useful data set. IPY strongly encourages the formal citation of data, and some scientific journals have or are considering a requirement that data be cited when used in articles they publish. Where formal citation is not possible, such as with some medical and social science data, ethical policies for data collection and data use are encouraged, building upon existing models such as Article 8(j) of the 1992 Convention on Biological Diversity, though still more work has to be done on this topic.
Another significant focus of discussion is data rescue: retrieving data hidden away in various labs or offices of retiring professors, data on mildewing paper logs or obsolete media, or data from defunct government projects. One approach is for scientists to inventory major collections of existing data and information and then to set priorities for the rescue and permanent preservation of the data and information that are most valuable and at greatest risk. An example of effective data rescue is the work of David Atkinson who, while a graduate student at the University of Ottawa, gathered all of the climatological data for Canada’s Queen Elizabeth Islands that were opportunistically collected as part of individual expeditions. These disparate data sets have been collated into an electronic data base that has proven to be a significant resource to polar scientists. These are data that were not available through Canada’s traditional Environment Canada network. There is not a specific IPY project for data rescue, but many IPY project scientists have emphasized the need. Such work is being done when time and funding permits, but there should be a coherent data rescue initiative with specific guidelines and standards.
Perhaps the greatest legacy of the IPY will be recognition of the importance of effectively managed data and the development of formal and professional data management. From IPY we will develop lessons and best practices that could form the basis for graduate courses in scientific data management. When the value of interdisciplinary and international access to data through standardized protocols becomes clearly demonstrated as a result of IPY, financial support for data and information management could become a routine and required component in all research budgets. The evaluation criteria for assessing research proposals should include evaluation of data management plans and capabilities. This would encourage recognition and credit for scientists producing data as well as analyzing data.
A modest start that will help us evolve towards that goal is the development of a Wiki for ‘Best Practices in Polar Data Management”. This will evolve as a consequence of the experiences of the next few years. For example, in Canada such a site is being planned as part of their “Polar Metadata Catalogue” which incorporates the Canadian IPY Master Directories as well as metadata for other polar projects such as ArcticNet and the National Contaminants Program. This is modeled after the ‘Best Practices’ Wiki that has been developed by the Architecture and Data Committee for GEOSS to provide similar guidance for practitioners based upon experiences by their peers.
This is a time of great optimism for the management of scientific data. IPY has provided the opportunity and guidance to enable extensive multidisciplinary and multinational collaboration using shared data and recognized protocols. The success of IPY will demonstrate the value of this approach. Funding agencies can be more proactive in requiring and funding effective data management plans. The next IPY will have a rich heritage to build upon.