Semantic Network Dictionary for Ontological Information with Wiki

Shinjuku skyscrapers

Shinjuku skyscrapers

Masahiko Nagai, Masafumi Ono, Ryosuke Shibasaki

INTRODUCTION

The global environment is lying on trans-disciplinary fields, such as meteorology, hydrology, geology, geography, agriculture, biology, and so on. It is essential to cross these trans-disciplinary fields for measures of the global environmental problems, such as climate change, global warming, various disasters, and so on. One of the key issues is data interoperability arrangement under the trans-disciplinary condition. There are two aspects of the data interoperability: syntactic interoperability and semantic interoperability. Improvement of both aspects of interoperability is needed for integrated use of heterogeneous data. To improve the syntax interoperability, many efforts have already been made such as standardization of data formats and development of XML-based data encoding rules, i.e. an ISO (International Organization for Standardization) standard and OGC (Open Geospatial Consortium) standard. Improvement of semantic interoperability requires common understanding among different ontologies, terminologies, taxonomies, i.e. definitions and associations of various concepts/terms, name spaces, classification schemes and so on, which is collectively called an “ontology”. The word “ontology” was originally used in philosophy, to refer to the branch of metaphysics that deals with the nature of being. Currently, in context of knowledge sharing, the term means a specification of a conceptualization [1]. In recent years, several institutions have initiated efforts to propose a standard ontology and/or terminology/taxonomy related with Earth Observation. SWEET (Semantic Web for Environment and Technology) by NASA (National Aeronautics and Space Administration) is one such ontology [2]. FAO (Food and Agriculture Organization of the United Nations) is making similar kinds of efforts based on AGROVOC, that is s a multilingual, structured and controlled vocabulary designed to cover the terminology of all subject fields in agriculture, forestry, fisheries, food and related domains [3]. Many other ontologies and terminologies/taxonomies are expected to be proposed by other expert/professional communities and institutions. For data interoperability, ontological information including terminology, taxonomy, glossary, etc., must be collected, managed, referred and compared; for example, data dictionaries, classification schemata, terminologies, thesauruses, and their relations are handled. Common understanding of heterogeneous semantic information is used for data sharing and data services such as supporting data retrieval, metadata design, information mining, and so on.

We have constructed a semantic network dictionary to utilize ontological information effectively. In this study, a semantic network dictionary is constructed for information sharing by using a Semantic MediaWiki, which helps to gather ontological information and associations for data interoperability among diversified and distributed data sources. Generally, ontology is applied to a strict and well-defined purpose, classes and instances such as a task ontology [4], but in this study, the scope of ontologies is not restricted and comprises any reference information based on terminology of technical terms for data interoperability. A semantic network dictionary creates a “knowledge writing tool” for experts, by extracting semantic relations from authoritative documents using natural language processing techniques, such as morphological analysis and semantic analysis.

Fig 1.  Semantic MediaWiki editing interface

Fig 1. Semantic MediaWiki editing interface

WHAT IS SEMANTIC NETWORK DICTIONARY?

A semantic network dictionary was developed to store ontological information with Semantic MediaWiki. A semantic network dictionary means that a certain term is expressed by definition and relations to other terms such as is-a, part-of, synonym, homonym, and so on. Entry words, definitions, sources, and authors are handled as nodes, and relations to other terms are handled as links. Those terms are surrounded by other relational terms. There are a few key requirements of the semantic network dictionary: reliability, simple structure, and easy browsing and modification.

Reliability

The semantic information in a semantic network dictionary must be reliable, when users integrate data by referring to ontological information. If reliability is not confirmed, the interoperability is not achieved. For reliability of the information, reliable data sources should be selected, and data documentation must be obvious. In this study, collaboration with scientific society and international organization is conducted for reliability. Lists of technical terms and associations of terms are provided as ontological information from specialists of each field. Reliability of data documentation is also achieved by adding authors and titles of the references. Editing, as well as definition, of terms is followed by original sources for keeping reliability. In this study, reliable ontological information is provided such as SWEET by NASA, QUASHI ontology by QUASHI-HIS Project [5], CEOS Missions, Instruments and Measurements database by CEOS (Committee on Earth Observation Satellites) [6], WMO glossary by WMO (The World Meteorological Organization) [7], GEMET thesaurus by EIONET (European Environment Information and Observation Network) [8], and so on.

Simple Structure

A semantic network dictionary consists of terms with definitions and their relations, so the basic structure of ontological information must be quite simple. This makes it is easy to obtain a lot of data from various sources, and it helps to save labor for data construction. This is one of the key points in collecting and managing ontological information. Original formats of ontological information are not only text and spreadsheet table but also XML (Extensible Markup Language), RDF (Resource Description Framework), and OWL (Web Ontology Language). These formats are simply converted and expressed with tags for use in the Semantic MediaWiki.

Fig 2.  Table editor which is Semantic MediaWiki plug-in

Fig 2. Table editor which is Semantic MediaWiki plug-in

Easy Browsing and Modification

The purpose of a semantic network dictionary is to support the interoperability of data by making it easy to reference a trans-disciplinary field. The structure of such a dictionary is a simple network among terms; so browsing the dictionary resembles operating a hyper link of web browser and web API (Application Program Interface). Also, it is easy to add or edit their links and nodes, and to cut off certain part of dictionary, and to export in XML and RDF format.

FUNCTIONS OF SEMANTIC NETWORK DICTIONARY

Registration

In order to collect ontological information with the above requirements, a registration system is developed based on Semantic MediaWiki (version SMW1.2). Semantic MediaWiki is a feature-rich wiki implementation. Semantic MediaWiki handles hyperlinks and has simple text syntax for creating new pages and cross-links between terms [9]. Entry words, definitions, sources, and authors are handled as nodes with tags, and relations to other terms are handled as links. Those terms are surrounded by other relational terms. Here, each ontology or terminology is managed by separate Wiki, for example, SWEET Wiki is created for SWEET and GEMET Wiki is created for GEMET.

At first, ontological information is added to Semantic MediaWiki by automatically converting to XML and importing to Wiki. Sometimes, ontological information is manually registered from book and Web pages. These existing dictionaries or glossary are already considered as ontological information. OCR (Optical character reader) is sometimes used to digitize the sources. Secondly, symbols and abbreviations, such as related words and synonyms are extracted from the dictionary and converted from semantic structure to syntactic structure. Finally, imported ontological information is modified by authorized users with editing function of the Wiki as shown in Fig 1.

In Semantic MediaWiki, a visual depiction of content is expressed by tags. It is not easy to add or select appropriate relations by tags without knowledge of computer science, so in this study, we developed a table like editor as a wiki plug-in. The table editor links to editing page of the Wiki by pop-up window and suggest appropriate tags to control community authoring. The table editor is implemented by dhtmlGrid, v1.2 standard. XML is prepared for Web server. Fig 2. shows Semantic MediaWiki table editor, by which the user can browse and edit explanations of a term without ambiguity of tags. Semantic Media Wiki displays not only definitions, but also relations of terms among multiple Wikis. The table editor is applied in order to modify relations of terms by using a table without tags.

Fig 3. Reverse Dictionary, which retrieve multi data sources

Fig 3. Reverse Dictionary, which retrieve multi data sources

Information Retrieval

Registered ontological information with Semantic MediaWiki is retrieved by a reverse dictionary. A reverse dictionary describes a concept of a term from definitions and associations of terms. The reverse dictionary is developed based on GETA (Generic Engine for Transposable Association), which was developed by the National Institute of Informatics, Japan [10]. It comprises tools for manipulating large-dimensional sparse matrices for text retrieval through more than one Wiki in all together. GETA is an engine for the calculation of associations such as similarity measurement of multiple Wikis. In order to create matrices to find similarity, morphological analysis is conducted for word segmentation and listing of ignored words for calculation. The query is “earth environment observation by satellite or air-craft”. The result is “remote sensing”. As an example of information retrieval, suppose a user wants to know about a “satellite for sea surface temperature”. The reverse dictionary returns the answer as a list of terms with similarity scores as shown in Fig 3., such as “sea surface temperature” and “MSMR” in CEOS terminology, “Thermal sea power” in GEMET terminology, and so on. The reverse dictionary relates data by calculation of similarity by using a definition. The user without basic knowledge can discover that a “MSMR” instrument is good for monitoring sea surface temperature and that sea surface temperature is related to “Thermal sea power”.

Graphical Representation

In order to compare associations among the different key words from various ontology and terminology which is managed by each Wiki, graph representation as shown in Fig 4. is useful. The graph representation is developed by KeyGraph that is open source of Java library. XML data that is constructed in the Wiki is visualized with the result of information retrieval by the reverse dictionary. All the related terms from various ontologies and terminologies are represented at once.

Fig 4.  Graph representation

Fig 4. Graph representation

One of the examples of graph representation is a term from land use classification schema in Thailand and Indonesia. The term “water body” land use class can be found in both countries. Apparently, both land use classes are the same, but the level of hierarchy is a bit different in each classification schema. In the case of Indonesian land use, “water body” does not include watercourses, but “water body” in Thailand includes all water-related geographical features. Consequently, graph representation proves a clear distinction between the two terms. Then, the new information such as the relations of “water body” in both countries can be created that “water body” class in Thailand is the same as “water” class in Indonesia. This kind of information is treated as newly-created ontological information, and is added through the Semantic MediaWiki. The ontological information can grow autonomously by adding relations, becoming more and more useful.

CONCLUSION

In conclusion, many standardization organizations are working for syntactic-level interoperability. At the same time, semantic interoperability must be considered as a heterogeneous condition and also very diversified with a large-volume data. Ontological information has been managed for data interoperability. This is a very challenging method for earth observation data integration because collaboration or cooperation with scientists of different disciplines is essential to its reliability. Multiple semantic MediaWikis are applied to register and update ontological information as a part of semantic network dictionary, which promises to be a useful tool for users. Registered ontologies supply the reference information required for interoperability. For integration of Earth observation data, it is essential to clarify ontological information, such as same-as, contain, near, part-of, is-on, consist-of, and so on. In order to invite contributions from the user community, it is necessary to provide sophisticated and easy-to-use tools and systems for a semantic network dictionary, such as a table-like editor, reverse dictionary, and graph representation for sustainable development and usage of ontological information.

ACKNOWLEDGMENT

This study is supported by DIAS (Data Integration and Analysis System) project. DIAS project is designated as key technology of national importance, Japan, which contributes for GEOSS.

REFERENCES

[1] B. Smith, Preprint version of chapter “Ontology”, in L. Floridi (ed.), Blackwell Guide to the Philosophy of Computing and Information, Oxford: Blackwell, 2003, pp.155–166.

[2] Jet Propulsion Laboratory, California Institute of Technology, Semantic Web fir Earth and Environmental Technology. Available: http://sweet.jpl.nasa.gov/index.html

[3] FAO, AGROMOC Thesaurus.
Available: http://aims.fao.org/website/AGROVOC-Thesaurus/

[4] Y. Kitamura, M. Kashiwase, M. Fuse and R. Mizoguchi, “Deployment of an ontological framework of functional design knowledge,” Advanced Engineering Informatics, Volume 18, Issue 2, April 2004, pp. 115-127.

[5] CUAHSI’s Hydrologic Information System.
Available: http://his.cuahsi.org/

[6] CEOS Missions, Instruments and Measurements database online.
Available: http://database.eohandbook.com/index.aspx

[7] WMO Space Programme – Glossary.
Available: http://www.wmo.int/pages/prog/sat/Glossary.html

[8] European Environment Information and Observation Network – EIONET.
Available: http://www.eionet.europa.eu/

[9] B. Leuf, and W. Cunningham, “The Wiki Way: Quick Collaboration on the Web”, Addison-Wesley, USA, 2001.

[10] A. Takano, Y. Niwa, S. Nishioka, M. Iwayama, Toru Hisamitsu, O. Imaichi, H. Sakurai, Information, “Access based on Associative Calculation”, In Lecture Notes in Computer Science LNCS:1963, Springer, 2000.

Masahiko Nagai is an visiting researcher at the University of Tokyo, Japan. He obtained his B.S. degree in Chemistry from St.Cloud State University, U.S.A., M.S. degree in 2002 in Remote Sensing and GIS from Asian Institute of Technology, Thailand, and Doctoral Degree of Eng. in 2005 in civil engineering from the University of Tokyo, Japan. His current research focuses on development of ontological information as an interoperability arrangement, as well as construction of 3D modeling by integrating multi-sensor.

Masafumi Ono is a researcher at the University of Tokyo, Japan. He obtained his B.S. degree in 2000 in the Department of Electrical and Electronic Engineering from Kobe University. His research interests are geospatial ontology, qualitative spatial reasoning based on the theory of descriptive and predicate logic.

Ryosuke Shibasaki is a professor at the Center For Spatial Information Science, the University of Tokyo. He obtained B.S. degree in 1980, M.S. degree in 1982 and Doctoral Degree of Eng. in 1987 in civil engineering from the University of Tokyo. From 1982 to 1988, he worked in Public Works Research Institute, Ministry of Construction. From 1988 to 1991, he was an associate professor of civil engineering department, the University of Tokyo. His research interests covers 3D data acquisition for GIS, conceptual modeling for spatial objects and agent-based micro simulation in GIS environment as well as GPS.