Haiti Earthquake: Harmonizing post-event distributed data processing

Joan Masó (joan.maso@uab.cat), Xavier Pons (xavier.pons@uab.cat), Bastian Schäffer (schaeffer@uni-muenster.de) Theodor Foerster (theodor.foerster@uni-muenster.de) and Roberto Lucchi (RLucchi@esri.com)

Introduction

OpenStreetMap of Port-au-Prince. OpenStreetMap data includes roads, boundaries, transportation resources, water and health infrastructures, medical facilities, and ad hoc settlements (refugee camps).

Figure 1: OpenStreetMap of Port-au-Prince. OpenStreetMap data includes roads, boundaries, transportation resources, water and health infrastructures, medical facilities, and ad hoc settlements (refugee camps).

The earthquake in Haiti in January 2010 was one of the most destructive events of this century. It caused 112,405 deaths and 482,349 people were left homeless (OCHA Haiti situation update map Jan. 31, 2010). The devastation shown by the media around the world generated shock and consternation in the general public. The reaction teams and humanitarian aid organizations urgently needed information in order to coordinate their efforts on the ground. The official response came from several international bodies, such as the United Nations Institute for Training and Research (UNITAR) Operational Satellite Applications Programme (UNOSAT), the International Charter Space and Major Disasters, as well as the United Nations Platform for Space-based Information for Disaster Management and Emergency Response (UN-SPIDER). In addition, unofficial amateur groups with common interests using tools such as OpenStreetMap (Figure 1) and Google Mapmaker elaborated cartographic products much faster than the official sources could. These efforts increased the data flow generated in one of the most compelling manifestations of the crowdsourcing phenomena ever seen. Meanwhile, websites collected most of the geospatial information available. These were mainly satellite images and old U.S. thematic maps that were used as basemaps to produce new damage assessment maps that could be used to respond immediately to the crisis.

A testbed was developed by the Open Geospatial Consortium (OGC) so that the data collected could be distributed across organization boundaries and remain interoperable, and also to provide tools for analyzing such heterogeneous datasets and organizing the information for the decision-making process. The testbed pilot was based on the Haiti event, but the context and results are relevant for the management of future post-crisis disaster events. The OGC regularly carries out several interoperability experiments that focus on testing current standards and proposing improvements for existing ones as well as creating new standards as necessary. The OGC Web Service phase 7 experiment (OWS-7) started at the beginning of 2010. Several experiments were run against realistic scenarios, such as the recent Haiti earthquake. Real (raster and vector) datasets of the Haiti area were made available, processed, and visualized by different services and clients in the network in order to support decision-making processes. Several organizations provided services and clients in OWS-7. One of the main objectives was to combine the discovery and integration of unstructured data sources (e.g., in-situ sensor data, tables with positional information, RSS feeds, GeoPDFs) with more structured information like remotely sensed imagery or accurate thematic maps that come in standardized formats and are provided by standardized web services and catalogues. Another use case was focused on applying common vector analysis techniques in a heterogeneous process service environment in order to determine the appropriate location for temporary refugee camps. The results of these interoperability experiments are available as a collection of reports and in a set of videos that describe the scenarios used in the demonstrations.

The Web processing service standard

In the past, the decision-making process was supported by classic GIS tools for local data collection. With the advent of the Internet, this approach can now be complemented with interoperable web-based geoprocessing services. Distributed geoprocessing services exist for various reasons: Users do not have the appropriate software or expertise, users need computational capabilities beyond those of the average computer hardware, web-services provide access to algorithms protected by copyrights and which are not available in any other form, or data needed for executing the process are connected to the process service directly and are not publicly available to the user or cannot be downloaded in a reasonable time.

The major standard for distributed geoprocessing algorithms is the OGC Web Processing Service (WPS). This standard defines a common interface that facilitates publishing geospatial processes, as well as web clients discovering of and executing these processes. A WPS can be configured to offer any sort of GIS functionality to clients across a network, including access to pre-programmed calculations and/or computation models that operate on spatially referenced data. A WPS may offer calculations as simple as subtracting one set of spatially referenced numbers from another (e.g. determining the difference between the number of influenza cases in two different seasons), or as complicated as a global climate change model. The data required by the WPS can be delivered across a network or be directly available on the server side.

Testing web processing services in post-event Haiti earthquake

Several WPS applications have been deployed and made available and can be used in the interoperability experiments, such as deegree, pyWPS, ZOO Project and 52 North. With all these processing services in place, we found that the main interoperability problem in using particular WPS instances is the flexibility of the WPS standard. Indeed, it is possible, and happens frequently that service providers expose some WPS process in an equivalent although different way; for instance they may require a different set of parameters and data types. This is one of the challenges that needs to be addressed in order to implement client applications that can apply any potential processing operation available in the WPS network. The problem is even bigger when we try to combine crowdsourced data with remote sensing and classical GIS data formats as occurred in the case of the Haiti earthquake.

It is believed that WPS profiles can address these interoperability problems. However, little work has been carried out on defining what a profile is and how it can be used. The OWS7 interoperability experiments show that in practice a profile consists of a set of operations carefully described in a document. The content of this document can follow the Reference Model of Open Distributed Processing (RM ODP) that divides the design aspect of the distributed processes into various viewpoints: enterprise, information, computational, engineering and technological.

A WPS profile document describes several aspects starting with the analytical scope of the profile as a whole (enterprise viewpoint) and the data models that will be addressed (information viewpoint). Then, it provides a list of the operations that an implementation of this profile has to perform, with a detailed description of the functionality, input and output parameters names and definitions and internal functionality (computation viewpoint). Finally, it enumerates the formats supported by this input and output parameters (technology viewpoint).

This profile is given a Universal Resource Name (URN) that uniquely identifies it and which is included in the profile document and in a name resolver catalogue of the name authority (NA) that has provided the URN. OGC has its own NA and its own resolver: urn.opengis.net. Computational and technological viewpoint requirements include almost all the details in a DescribeProcess XML document response; therefore, a profile document will also include a reference DescribeProcess XML document response example for each process. Each WPS operation implementation of this profile has to reference the URN of the profile description document. These conclusions are outlined in a public engineering report.

In OWS-7, WPS profile strategies were applied to develop a common profile for a set of well known topological rules (i.e. equals, disjoint, intersects, touches, crosses, within, contains, and overlaps) and vector operations. In addition, the buffer, difference, clip, intersect and reclassification operations were also tested and profiled. In many cases, data came from three main sources: WFS servers, files on the web (particularly the crowdsourcing environment) and other WPS processing results. The fact that the profile was designed so that outputs of one WPS operation were in the same data type and format as inputs of other processes made it possible to chain different operations and generate a complete workflow of processes that could assist decision-making in a disaster analysis situation. The profile was tested with three implementations: Intergraph, 52 North and MiraMon. More details about the process for generating profiles and the difficulties that were observed can be found in the feature and statistical analysis public engineering report. Some problems were detected outside the WPS standard scope but rather in the current implementations of WFS/WPS servers and GML encodings. In particular, some WFS/WPS that were tested were not able to read and write a valid GML Simple Features Level 0 file so the testbed had to deal with the whole variety of GML types and versions. Other problems included: The coordinate order of the EPSG:4326 coordinate reference system (CRS) was interpreted differently by different vendors, there was a variety of different CRSs, and there were incorrect or missing references to GML application schemas.

The workflow created with the ArcGIS Model Builder illustrating a graphical environment in which different data sources (datasets and feature services; represented in dark blue circles) are connected to WPS operations (represented in brown boxes) and chained through the WPS outputs (new data represented in light blue).

Figure 2: The workflow created with the ArcGIS Model Builder illustrates a graphical environment in which different data sources (datasets and feature services; represented in dark blue circles) are connected to WPS operations (represented in brown boxes) and chained through the WPS outputs (new data represented in light blue). The aim of this demonstration workflow is to determine regions that would be good locations for refugee camps. The environment hides most of the complexity of the WPS protocol, and therefore the analyst/operator can concentrate their attention on the data and the workflow.

In a WPS, clients and services communicate by exchanging XML documents. It is not easy to generate XML files for requests, send the requests and interpret the responses without the right tools that mask the complexity of the protocol. This is particularly true for emergency response operators who work under stressful conditions and who need a good, simple user interface to quickly obtain results and provide rapid mapping products that can help people working at ground level to save lives. Generic tools that can manage all the available WPS are difficult to implement due to the enormous variety of operations that a WPS can execute. To make matters worse, in many cases atomic executions must be chained to obtain immediate results from an analytical study, which also need to be managed. During OWS7 a WPS client also was developed so that these processing operations and results could be invoked and reused within a workflow. Instead of using a command-line interface, a graphical interface helps define workflows in which datasets and other data (represented by dark blue circles) can be connected to operations (represented by squares) to generate new datasets (represented by light blue circles) (Figure 2). These can be connected and used further in new operations. When the workflow modelling is completed, the coherence of the model is checked, the process is executed, the state of the whole workflow is shown and the final results are delivered. The workflow can be easily revisited, partially modified and re-executed with new data.

A complete test involving three implementations (Intergraph, 52 North and MiraMon) and several sources of information from WFS services was performed to determine the appropriate location for a refugee camp. In the Figure 2 example, WFS servers provide the input data on land use areas (polygons), floodplain areas (polygons), crime zones (polygons), functional medical facilities (points), and uncontaminated wells (points). Initially, land-use categories were reclassified (1st WPS) to get the good areas, then floodplains (2on WPS) and crime areas (3rt WPS) were excluded. A refugee camp has to be placed near medical and close to good water supplies, so buffer operations were applied to these datasets with reasonable thresholds distances (4th and 5th WPS) to get the close enough regions. These areas were successively intersected (6th and 7th WPS) with results from previous web processing executions and the final GML dataset was obtained and can be shown in Figure 3, combined with some WMS satellite data as background.

The results of the previous demonstration workflow showing the regions that would be good locations for a refugee camp as polygons that are portrayed in one of the integrated clients (example created by ESRI).

Figure 3: Results of the previous demonstration workflow showing the regions that would be good locations for a refugee camp as polygons that are portrayed in one of the integrated clients (example created by ESRI).

It is worth noting that there is still room for different implementations of a single profile, particularly in terms of fast and reliable decision-making. In principle, two independent implementations that are based on the same WPS profile are interoperable and interchangeable. Nevertheless, the result can be very different. Even following the same pattern, one implementation can be slower than another and therefore deliver the results too late in a crisis situation.

In addition, when an implementation runs on a publicly available server, several executions can be requested at the same time. If the service was not appropriately designed, this can degrade the service performance, but fortunately there are several solutions for this problem. In addition, the results can still be different depending on the internal algorithm that is implemented, the precision of the computed calculations and the approximations that the developer used. The quality of the results can also be conditioned by the output format and the completeness and precision of the metadata provided with the data (ISO 19115 core metadata, lineage, quality propagation estimators, etc.). To improve the user experience with long processes, asynchronous executions that can be interrogated about the state of the process are necessary, and may include complete reports on the state of the process and an estimate of the time to completion.

Conclusions

WPS operations in a distributed environment alone cannot solve the equation. In order to rapidly react in an emergency situation, it is necessary to have Earth observation data as well as in situ data. Fortunately, there are several initiatives that help decision-makers obtain the necessary information. In addition to UNOSAT and the International Charter Space and Major Disasters mentioned above, we can add GMES and all the participants in GEOSS, which offer a integrated infrastructure to distribute Earth observation data as well as a set of data sharing principles that almost 100 countries and organizations use. GEOSS Clearinghouse is cataloguing thousands of datasets that would be useful in many environmental problems and particularly in emergency situations. Nevertheless, the GEOSS infrastructure only includes a very limited number of distributed processing and analytical tools. It is expected that in the near future the number of processing services will geometrically increase. The strategies described in this paper will help to improve the interoperability of the current GEOSS tools and facilitate creating new ones, as well as increase the usability of the data.

The authors of this paper thank the support of the sponsors of the OWS7 testbed, the OGC, and European Commission through the FP7-242390-GEO-PICTURES project (SPACE-2009-1)

The Authors

Mr. Joan Masó (MSc in Physics in 1994, and a MSc in Electronic Engineering) since 1995 he is a researcher at CREAF. Co-creator of the MiraMon compressed map and the MiraMon Map Reader idea in 1997. He is a co-developer of the OGC WMS, WFS and WCS server and client MiraMon technology. He is an active member of the TC of the Open Geospatial Consortium (OGC) since 2003 and the editor OGC 07-057r7 WMTS recently approved, Spanish representative for the current ISO19115 revision process and active member of the GEOSS Standards Interoperability Forum. He is the scientific coordinator of the EU FP7 GeoViQua, an. IEEE Member.

Dr. Xavier Pons is full professor at the Dep. of Geography of the UAB and recipient of an ICREA Acadèmia Excellence in Research grant (2011-2015). His main work has been done in radiometric and geometric corrections of satellite imagery, cartography of ecological parameters from airborne sensors and GIS development, both in terms of data structure, organization and international standards for geoservices, and in terms of software writing (MiraMon). He also has worked in forest fire mapping, analysis of landscape changes, water usage and snow coverage from a long time series of images. He is currently contributing to climate modelling and to the implications of image compression and dissemination.

Bastian Schäffer is a research assistant at Institute for Geoinformatics (IfGI) and the head of the geoprocessing community at the open source initiative 52°North. His research interests focus on interoperability, SDIs, Geoprocessing Workflows and Cloud Computing.

Dr. Theodor Foerster (1980) is a research associate at the Institute for Geoinformatics of the University of Muenster, Germany and leads the Sensor Web, Web-based geoprocessing and Simulation Lab. He has more than five years of job experience with a strong focus on design and development of web-based architectures for geographic applications. Until November 2009, he was a PhD candidate at the International Institute for Geoinformation Science and Earth Observation in Enschede, the Netherlands, where he received his PhD degree from the University of Twente for his research about Web-based Architecture for On-demand Maps – Integrating Meaningful Generalization Processing. Before that, he worked at ifgi as a researcher. In 2004, he obtained a Diploma degree in Geoinformatics from the University of Muenster.

Dr. Roberto Lucchi is project manager and ArcGIS for INSPIRE product manager at Esri. He is expert on spatial data infrastructures and geoportal design and implementation, Web and GIS standards. In 2004, he earned a PhD in computer science from the University of Bologna (Italy). Research topics included coordination models, orchestration and choreography languages, and their expressive power.

Topic:

Tags: