Use of background real-world knowledge in ontologies for world sense disambiguation in the semantic web

The purpose of this thesis is to show how word sense disambiguation (WSD) can be improved with background real-world knowledge encoded in ontologies and, especially, in ontologies based on psychological considerations. Ontologies are used, because conceptualized background knowledge is not available...

Full description

Bibliographic Details
Main Author: Legrand, Steve
Format: Doctoral dissertation
Language:eng
Published: 2008
Subjects:
Online Access: https://jyx.jyu.fi/handle/123456789/103642
Description
Summary:The purpose of this thesis is to show how word sense disambiguation (WSD) can be improved with background real-world knowledge encoded in ontologies and, especially, in ontologies based on psychological considerations. Ontologies are used, because conceptualized background knowledge is not available directly, from texts, to WSD systems. Although it is possible to disambiguate text to some extent without using ontologies, employing this kind of knowledge for WSD is of great help, especially in an environment like the Semantic Web, which has been the principal motivating factor behind this thesis. Some of the real-world knowledge, which is indispensable for human understanding, cannot be readily encoded in conventional ontologies either. One of the fundamental types of this kind of embodied knowledge is basic-level categories. After showing that conventional ontologies can be used to automatically group and label concepts in a text for disambiguation purposes with the help of self-organizing maps, the idea is extended to ontological structures based on basic-level categories. The thesis shows that the use of basic-level categories in WSD significantly improves accuracy. It also shows that linguistic phenomena, such as metaphoric expressions, can be manipulated structurally to reduce them to basic-level components with the potential to use them in WSD. The approach used here proves fruitful and can be used as a starting point for designing an application that not only disambiguates using hybrid systems (including ontological real-world component) but also selects the best applicable disambiguation system for a particular word.