TÍTULO TESIS:

Semantically enhanced Information Retrieval: an ontology-based approach


 
 

AUTOR:

Miriam Fernandez Sanchez

DIRECTOR:

Pablo Castells Azpilicueta

TUTOR:

Pablo Castells Azpilicueta

FECHA

CALIFICACION 


 


RESUMEN:

 
 


 


ABSTRACT:

The amount of content stored and shared on the Web and other document repositories keeps in-creasing steadily and fast. This growth results in well known difficulties and problems, such as finding and properly managing all the existing amount of information. Striking progress has been achieved in the last decade with the development of search engine technologies, which collect, store and pre-process this information to return relevant resources in response to users’ needs. However, users still miss or need considerable effort sometimes to reach their targets, even if the sought information is present in the search space.

A common cause for this is that currently consolidated content description and query processing techniques for Information Retrieval (IR) are based on keywords, and therefore provide limited capabilities to grasp and exploit the conceptualizations involved in user needs and content meanings. This involves limitations such as the inability to describe relations between search terms (e.g., “hurricanes originated in Mexico” vs. “hurricanes that have affected Mexico”, “books about recommender systems” vs. “systems that recommend books”), or the weakness to properly cope with linguistic phenomena such as polisemy (e.g., “mouth” as part of the body vs. “mouth” as the point where a stream issues into a larger body of water) or synonymy (e.g., find “movies” when the user queries for “films”).

Aiming to solve the limitations of keyword-based models, the idea of conceptual search, understood as searching by meanings rather than literal strings, has been the focus of a wide body of research in the IR field. More recently, it can be said to have become one of the “philosopher’s stones” in the Semantic Web (SW) community since its emergence in the late nineties. However the undertakings in information search and retrieval from the semantic-based technology area have not yet taken full advantage of the technologies, background, knowledge, and accumulated experience through several decades of work in the IR field tradition. One might say there is even some mismatch in the understanding of ground notions in both areas, such as information need, relevance, retrieval task, methodological soundness, etc.

Starting from this position, this thesis investigates the definition of ontology-based IR models, oriented to the exploitation of domain KBs to support semantic search capabilities in large document repositories, stressing on the one hand the use of full-fledged ontologies in the semantic-based perspective, and on the other the consideration of unstructured content as the final search space. In other words, we have explored the use of semantic information to support more expressive queries and more accurate results, while the retrieval problem is formulated in a way that is proper of the IR field, thus drawing benefit from the state of the art in this area, and enabling more realistic and applicable approaches.