18 Reasons for Open Publication of Geoscience Data

Network Cables

Network Cables

By Lance McKee
Senior Staff Writer
Open Geospatial Consortium (OGC)
lancemckee@opengeospatial.org

Despite rapid advances in technical capabilities for data sharing, much of the data collected by Earth scientists (other than data from civil agencies’ satellite-borne imaging systems) is not easily available to other scientists. Given the fact that humanity faces critical environmental and resource challenges, the research community should take steps to make Earth location-referenced data much more discoverable, assessable, accessible and widely usable.

This article, the first in a series of three short articles, offers 18 reasons, or goals, that detail my understanding of this obligation. The next article will explain how open interface and encoding standards from the Open Geospatial Consortium (OGC) and other standards development organizations contribute to achieving these 18 goals. The third article will offer evidence that this change is well underway, gaining momentum, and inevitable, and it will include a few suggestions.

My point of view is that of an interested observer of science and someone with 16 years of participation in the OGC’s open geoprocessing standards initiative. I see parallels in the regime change that has occurred in the geospatial technology market and the regime change that is beginning in science. Often the most difficult obstacles to progress are institutional, financial and behavioral. Technical progress is often easier and it often precedes and forces the obsolescence of old policies, arrangements and behaviors. I see that happening in Science. I think this is healthy for Science and I want to make the geoscience community aware of the standards and the consensus standards process that I see as powerful agents for positive change.

Open data and open science are being discussed in various forums, such as the Open Knowledge Foundation’s “open-science” listserv, and my purpose in this article is not to frame a careful definition of either. I don’t address the thorny issues of ownership, copyright and privacy. I take the position that the current regime was shaped by technology in the late book publishing era and the early computer technology era, and the new regime will be shaped by 21st century information and communication technology. In this context, the thorny issues will get sorted out by people who understand the potentials of technology, cherish the principles of science more than the traditions and institutions of science, and recognize the urgent requirement for better science.

Reason 1: Data transparency

Science demands transparency regarding data collection methods, data semantics and processing methods. Data – and scientific rigor — need to be documented! Subtending to this reason is another reason: cross-checking between data collections for sensor accuracy.

Reason 2: Verifiability

Science demands verifiability. Any competent person should be able to examine a researcher’s data to see if those data support the researcher’s conclusions.

Reason 3: Useful unification of observations

Being able to characterize, in a standardized human-readable and machine-readable way, the parameters of sensors, sensor systems and sensor-integrated processing chains (including human interventions) enables useful unification of many kinds of observations, including those that yield a term rather than a number. 1

Reason 4: Cross-disciplinary studies

Diverse data sets with well-documented data models or application schemas can be shared among diverse information communities. (OGC defines an information community as a group of people, such as a discipline or profession, who share a common geospatial feature data dictionary, including definitions of feature relationships, and a common metadata schema.) Cross-disciplinary data sharing provides improved opportunities for cross-disciplinary studies.

Reason 5: Longitudinal studies

Archiving, publishing and preserving well-documented data yields improved opportunities for longitudinal studies. As data formats, data structures, and data models evolve, scientists will need to access historical data and understand the assumptions so that meaningful scientific comparisons can be conducted. Community standards will help ensure long-term consistency of data representation. (Subtending to this reason is another reason: support for study and advancement of scientific ontologies.)

Reason 6: Re-use

Open data enables scientists to re-use or repurpose data for new investigations, reducing redundant data collection and enabling science to be done more efficiently.

Reason 7: Planning

Open data policies enable collaborative planning of data collection and publishing efforts to serve multiple defined and yet-to-be-defined uses.

Old Faithful, Yellowstone National Park, Wyoming, US

Old Faithful, Yellowstone National Park, Wyoming, US

Reason 8: Return on investment

With open data policies, institutions and society overall will see greater return on their investment in research, most directly because of reasons 6, 7 and 17, but perhaps most significantly because of reason 15.

Reason 9: Due diligence

Open data policies will help research funding institutions perform due diligence and policy development because it will be easier to review researchers’ and research programs’ past performance with respect to data quality and metadata quality.

Reason 10: Maximizing value

The value of data increases with the number of potential users. This benefits science in a general way. It also creates opportunities for businesses that will collect, curate (document, archive, host, catalog, publish), and add value to data. (Similar to Metcalf’s law: “The value of a telecommunications network is proportional to the square [or, some would say, some positive exponent not always 2] of the number of connected users of the system.”)

Reason 11: Data discoverability

Open data is discoverable data. Data are not efficiently discovered through literature searches or conventional search engines. Data registered in OGC standard catalogs using ISO-standard XML-encoded metadata enable efficient and fine-grained searches.

Reason 12: Data exploration

Robust data descriptions and quick access to data will enable more frequent and rapid exploration of data – “natural experiments (http://en.wikipedia.org/wiki/Natural_experiment)” – to explore hypothetical spatial relationships and to discover unexpected spatial relationships.

Reason 13: Data fusion

Open data improves the ability to “fuse” in-situ measurements with data from scanning sensors. This bridges the divide between communities using unmediated raw spatial-temporal data and communities using spatial-temporal data that is the result of a complex processing chain. 2

Reason 14: Service chaining

Open data (and open online processing services) will improve scientists’ ability to “chain” Web services for data reduction, analysis and modeling.

Reason 15: Pace of science

Open data enables an accelerated pace of scientific discovery, as automation and improved institutional arrangements give researchers more time for field work, study and communication.

Reason 16: Citizen science and outreach

Open science will help Science win the hearts and minds of the non-scientific public, because it will make science more believable and it will help engage amateur scientists – citizen scientists – who contribute to science and help promote science. It will also increase the quality and quantity of amateur scientists’ contributions. 3

Reason 17: Forward compatibility

Open Science improves the ability to adopt and utilize new/better data storage, format, discovery, and transmission technologies as they become available. 4

Reason 18: Timely intervention

“Changes to the Earth that used to take 10,000 years now take three, one reason we need real-time science. … Governances must be able to see and act upon key intervention points.” 5

I welcome additions to this list.

The purpose of creating open geoprocessing interface and encoding standards has not been to create a revolution in scientific institutions’ policies, or in scientific publishing businesses or scientists’ workflows and incentive structures. But such standards will certainly contribute to this revolutionary change, as advances in geospatial interoperability become known and useful to people who sincerely care about the basic requirements and values of science.

I hasten to add that the opinions expressed here are my own and are not to be seen as official positions or policies of the Open Geospatial Consortium (OGC).

1From an email exchange with Simon Cox, JRC Europe and CSIRO Australia, editor of ISO 19156 (Observations and Measurements), coordinator of OneGeology geoinformatics, a designer of GeoSciML, and chair of the OGC Naming Authority.

2From an email exchange with Simon Cox.

3From a conversation with Gordon Thompson, Executive Director, Institute for Resource and Security Studies (IRSS) and Research Professor, George Perkins Marsh Institute.

4Offered to OGC’s David Arctur for this list on 6 January 2010 by Sharon LeDuc, Chief of Staff, NOAA’s National Climatic Data Center, Asheville, North Carolina, USA.

5Brian Walker, Program Director Resilience Alliance and a scientist with the CSIRO, Australia.