A key step towards democratizing and ensuring efficient use of data, is making the data “inter-operable”
3 Jan, 2023
Afzalbek Fayzullaev, Hari Vishnu
Data has turned into one of humanity’s most valuable assets during the ongoing information age. As of 2017, data surpassed oil as the world’s most valuable resource. It assumes even more importance in light of the fact that the artificial intelligence/machine-learning-based technologies that define this century are data-hungry and require large tracts of well-sampled data in order to make better-informed decisions or inferences. From medical diagnosis and hazard protection to autonomous vehicles and navigation, the use of big data is ubiquitous in human lives.
It should come as no surprise, then, that management, accessibility, and interoperability of data and its sources, are key elements in humanity’s quest to improve its understanding of the Ocean, one of the key goals of the ongoing UN Decade of Ocean Sciences. Data is necessary to achieve all the desired outcomes of the Decade, and even more importantly, equitable access of data will be a necessary step towards the success of the Decade as explicitly imbibed in outcome #6. As it stands, the Ocean is under-studied and requires more exploration for us to be able to satisfactorily understand, use and live with it harmoniously. For example, until 2015, only 5% of the Ocean floor was mapped to modern quality standards, and that number has improved to 23.4% (as of the last update from the announcements from the Seabed 2030 project at Lisbon). Given this fact, democratizing the use of Ocean data and ensuring widespread access and uptake of existing tranches of data, should be one of the priorities of the Decade. Indeed, as the information technology sector has shown us, concentration of data in the hands of a few changes the nature of competition and collaboration in the field.
A key step towards democratizing and ensuring efficient use of data, is working towards making the data “inter-operable”, possibly with coherent standards and with a clear understanding on how they can be interpreted across domains and different sources. This requires different players in the Ocean domain, spanning academia, industry, and government agencies, to talk to each other and establish conversations, form bridges, create standards, and come to understanding on how this can be done.
A panel session discussing this topic was held at the UN Ocean conference in Lisbon, as a Side Event on Marine Data Interoperability. The opening remarks at the event by Ir. Jan-Bart Calewaert, Head of the EMODnet Secretariat , were straightforward and unwavering: Marine knowledge is not possible if marine data is not managed in a way so that it can “be shared and easily found and re-used by our engineers, scientists, policy advisors, and businesses.” He further went on to mention that marine data interoperability is vital in implementing the UN Sustainable Development Goal (SDG14). Calewaert, as well as other panelists later on in the event, made it clear that no real change can be made without the digitization, sharing, and management of data, information, and digital knowledge of the Ocean, a daunting task that is slowly being set in motion.
The moderator of this panel, Dr. Pier-Luigi Buttigieg who is a digital knowledge steward and senior data scientist at the GEOMAR Helmholtz Center for Ocean Research, led the conversation on interoperability alongside three experts from their respective fields in person and two online. Dr. Buttigieg highlighted the separation between data in the current world by making a comparison to icebergs and the panelists that were on stage with him -
- Kate Larkin, the Deputy Head of the European Marine Observation and Data Network (EMODnet),
- Hans Wendt, Marine Programme Coordinate, and
- Ben Williams, Metocean Director at Fugro
He mentioned the panelists and the sectors they represent are all like “separate very impressive icebergs” from all over the world that signify different sources/chunks/stewards of data sources, but need to be brought together in this conversation. In addition to panelists that were on stage, the event showcased pre-recorded segments on interoperability by additional virtual panelists:
- Sebastien Mancini, the Director of the Australian Ocean Data Network
- Tshikana Rasehlomi, Marine Information Management System/ AfrOBIS Node, and Data Manager
In the first of these segments, Mancini mentioned one of the key successes of the Integrated Marine Observing System. This success was that all “physical, biological, and biogeographical data from the open Ocean and coastal areas” is openly and freely available. This highlighted the efforts that IMOS makes towards data interoperability as they also make the code for ingesting data also open to being viewed by others. Rasehlomi followed immediately after Mancini by highlighting the importance of making Ocean data easily findable and shareable between many people and nations. Rasehlomi stated that scientific collaboration can be encouraged by making data openly accessible to all.
Wide network collaboration - the EMODNET example
After the pre-recorded videos were played, the panelists were able to introduce themselves and speak about their interests in interoperability. Dr. Larkin began by introducing EMODnet as a European Marine Data Service funded by EU commission, fisheries, and aquaculture funding. On the subject of interoperability, she made sure to mention that all data across all marine variables, even human activities, is accessible with added-value products as well on EMODnet. Her insight on the use of data was that there is a drive to be able to reuse data - data should be collected once and used many times. Dr. Larkin also spoke about the collaboration of over 120 organizations in creating the standards of data and bringing together all data and seeing how it can be used for European policy. Lastly, because of an expanding user community, she also mentioned the importance of simplifying data and making it easier for the user to experience. Centralizing metadata into a catalog would allow the user to have a region of interest where they can then find the data for that specific theme of interest. Because of the expanding user community, Dr. Larkin emphasized the importance of working towards interoperability.
Engaging underdeveloped countries, and Metadata
When asked by Dr. Buttigieg for examples on how EMODnet would talk to countries like South Africa or Fiji to convince them, Dr. Larkin replied by saying that EMODnet has already taken steps on how data can be searched, discovered, and visualized and allow machine-to-machine access as well. Larkin, similar to Wendt, mentioned that certain limitations exist in the current systems, and there is still more to be done. In order to know what other groups around the world are doing, there needs to be at least a minimum set of metadata that is interoperable.
Relating to the discussion of metadata, the moderator also asked the panelists if there are any agreements/forums on interoperability specifications. In response, Dr.Larkin said that it is currently under the context of IODE . The Developing Ocean Information Hub as well as the Ocean decade are good opportunities to scale it up.
Key players - the private sector
Ben Williams from Fugro, tapping into his 17 years of experience in the private sector, started off by mentioning how far data interoperability has come. In the past, sharing data was “carrying a floppy disk across the Ocean to a person behind a desk”. So although it can be easy to criticize current standings on interoperability, we must appreciate how far we came over the past twenty years. Williams also highlighted recent actions of Fugro such as a partnership it Fugro has announced with EMODnet, as well as past contributions to Seabed2030 and the donation of two million square kilometers of bathymetric data collected between certain project sites. It is important to note that this data is made publicly available with GEBCO and Nippon Foundation. Fugro and EMODnet will continue to work together not only in the European Sector but more.
Williams highlighted that in industry, it does not make sense to spend a lot of money to collect data and give it away for free [in the interest of the clients], and even went as far as to say “That would be crazy.” He discussed that, while Fugro still keeps the data private and has its client’s interests in mind, it also tries to de-sample, subsample, and give data to public space at an agreed resolution. He gave an example of how Fugro data was used by others - for instance, International Hydrographic Organisation used Fugro collected data on UK Hydrographics.
Williams closed his discussion by mentioning a technical change that can be taken by Fugro data collection. He questioned why there should be so much of a carbon footprint made installing and bringing back thirty tide gauges, when the tide gauges can stay there. This would allow them to record for longer periods of time and harbor more data without the extra carbon footprint used in uninstalling them.
Engaging more parties in the Private sector
Subsequently, the moderator asked Ben Williams about how the value of interoperability can be pitched to other organizations like Fugro to bring them to the negotiating table, and how negotiations in such areas can take place. Williams responded by saying that they would have to play within the frameworks that already are in place, meaning that private groups such as Fugro would have to engage with clients, either for sustainability goals or contribution to science or stakeholder engagement. Williams also went as far as to say that they may go to fishermen to show how data on the wind can benefit them. In order to better control how data is being used, they should try to bring into context and understand the data’s further uses so that the clients of the private sector are more comfortable sharing the data.
Buttigieg then made a prediction that within the decade, we should reach a point where for a particular required set of data, people should be able to point to someone who holds that data, and reach an agreement on its fair use.
Williams speaking of his involvement in creating the specifications in the private sector
When asked if he was involved in the private sector when creating the infrastructure and designing the specifications, Williams said that he was involved to an extent. He mentioned that the data that we want to make interoperable and open is not in archives but actually in the competitive landscape of Ocean observers. This means that the data creators, not necessarily the end users, should be where the interoperability discussion should start.
Multiscale/multisystem data and vertical interoperability of data
Following Williams, Wendt gave a look at what interoperability looks like at the regional, national, and community levels. Wendt outlined what interoperability is by defining certain aspects of it; data interoperability should be well-managed, accessible and able to be used for many different outcomes. At the regional level, Wendt collaborates with the council of crop agencies, the secretariat of environmental planning, and Pacific communities in addition to universities in the area. Moving on to the national level, Wendt gave the example of how he and IUCN work hand in hand with crop agencies to build data systems so that the data can be adequately used for planning. At the community level, data interoperability is “another story.” He had no problem admitting that no system is set in place and the only alternative is to physically go to each place, get the data, or provide the translated data.
After this, Dr. Buttigieg asked Wendt about how an agreement is reached between the regional scale and the global scale pertaining to the data that must be developed around the data to make it interoperable. Wendt answered that once leaders agree in meetings such as the Pacific Leaders meeting, the decisions trickle down to the regional levels. A limitation with this basic existing format is that all different systems do not necessarily talk to each other at the moment.
Amidst the talk of being able to share, find, and reuse data around the world, the moderator also asked the panelists if there is a common quality control standard that their data must reach to make sure it is technically as well as substantially sound. To answer this, Dr. Larkin gave an example of how EMODnet responded when in Europe 5 years ago, there was an immense amount of data that could not be used because it wasn’t “harmonized.” In response, EMODnet spoke with different stakeholders and collaborated on the best practices and conventions in order to harmonize the data. Now, EMODnet is able to quality control and categorize data within the metadata. This has been done thanks to thematic experts and tech experts that came to an understanding to make the European standard. Scaling outward to the global scale, Dr. Larkin also mentioned that since 2021, under OBPS of IOC, data is now not only for European users but is also open to global users as well.
Coming at the original question about quality control from a different perspective, Williams said that quality control should flag the data rather than delete it. For example, there are environmental risks whose quality or usability can be flagged based on ‘rate of change’ studies. However, metadata can contain what flagging standards have been used so that data is flagged rather can cleared. The moderator, agreed, saying that “you don’t necessarily need the cleanest data always.”
The event on Marine Data Interoperability then had two video contributions, the first being from Kevin O’Brien, a Senior Research Scientist at the University of Washington and a member of the Data Integration Group at NOAA’s Pacific Marine Environmental Lab. He focused on data activities at GOOS (Global Ocean Observing System) and OCG (Observation Coordination Group). OCG mapped data flow across their networks in hopes of finding gaps and opportunities in their data and metadata services towards FAIR compliance. After the first iteration of the GOOS OCG Data Mappings were done, the maps were then validated by data teams and next steps include extending the data mappings and developing OCG data recommendations and implementation strategy.
O’Brien said 5 successful approaches that he found to work are
- developing a strong and active data management team,
- developing best practices for data workflows,
- focusing on existing standards and conventions already widely in use,
- supporting data producers with tools that allow for flexibility,
- integrating flexible and interoperable data services which support both human and machine-to-machine access.”
Where to start, and a pitch for funding
To add to this, O’Brien mentioned one revolutionary change would be a change in the funding paradigm for data management. Data management has too often, so far, been funded as part of science projects which causes only the most basic data management sectors to be funded and not the higher level sectors which are needed. There is a need to support development of digital ecosystems to underpin the Ocean decade.
When it was time to take questions from the audience, Steve from Scripps Institution of Oceanography data coordination committee, pitched that it should be possible for us to make a complete inventory of everything from the past twenty years if there was proper funding for .
We summarize the takeaways from this panel below:
- Interoperability is important for marine data. No real change can be made without the digitization, sharing, and management of data, information, and digital knowledge of the Ocean
- The first step towards interoperability is knowing it's possible. A certain percentage of Ocean observations are already out there, waiting to be made interoperable, but the problem that is not being addressed is where to start.
- Funding is an important missing link, and the funding paradigm needs to be changed. To quote Vladimir Ryabinin (executive secretary of the Intergovernmental oceanographic commission), the Ocean covers 70% of Earth’s surface, but gets only about 1.7% of global spending in terms of research and development.
- Private sector holds large tranches of data too. They would not necessarily see any benefit in giving it away for free, but they can be convinced to talk to stakeholders to make parts of it available under agreements that are mutually beneficial.
- Multilevel interoperability is possible, but gets trickier at the bottom layers. Top-down trickle of decisions can be made to work by getting governments and higher-level authorities to agree upon terms.
- Data quality control does not have to delete or filter the data, merely flag it, with suitable metadata suggesting the standards used for flagging, so that it can still be used across domains or parties with differing needs.
- Focus on a strong data team, develop best practises, use existing standards.
To wrap up this segment of the event, the moderator summarized that there are still a few things to do. The structure for interoperability exists, but the operational layer has to work from not only the public to private sector but also groups that put their own data online. In doing so, we will eventually be able to have a machine-driven system to find the data.
The next segment of the event brought on stage another panel of experts to discuss their thoughts. We discuss this segment in the next part of this article.