ABSTRACT:
The amount of content stored and
shared on the Web and other document repositories keeps in-creasing steadily
and fast. This growth results in well known difficulties and problems, such
as finding and properly managing all the existing amount of information.
Striking progress has been achieved in the last decade with the development
of search engine technologies, which collect, store and pre-process this
information to return relevant resources in response to users’ needs.
However, users still miss or need considerable effort sometimes to reach
their targets, even if the sought information is present in the search space.
A common cause for this is that
currently consolidated content description and query processing techniques
for Information Retrieval (IR) are based on keywords, and therefore provide
limited capabilities to grasp and exploit the conceptualizations involved in
user needs and content meanings. This involves limitations such as the
inability to describe relations between search terms (e.g., “hurricanes
originated in Mexico” vs. “hurricanes that have affected Mexico”, “books
about recommender systems” vs. “systems that recommend books”), or the
weakness to properly cope with linguistic phenomena such as polisemy (e.g.,
“mouth” as part of the body vs. “mouth” as the point where a stream issues
into a larger body of water) or synonymy (e.g., find “movies” when the user
queries for “films”).
Aiming to solve the limitations of
keyword-based models, the idea of conceptual search, understood as searching
by meanings rather than literal strings, has been the focus of a wide body of
research in the IR field. More recently, it can be said to have become one of
the “philosopher’s stones” in the Semantic Web (SW) community since its
emergence in the late nineties. However the undertakings in information
search and retrieval from the semantic-based technology area have not yet
taken full advantage of the technologies, background, knowledge, and
accumulated experience through several decades of work in the IR field
tradition. One might say there is even some mismatch in the understanding of
ground notions in both areas, such as information need, relevance, retrieval
task, methodological soundness, etc.
Starting from this position, this
thesis investigates the definition of ontology-based IR models, oriented to
the exploitation of domain KBs to support semantic search capabilities in
large document repositories, stressing on the one hand the use of
full-fledged ontologies in the semantic-based perspective, and on the other
the consideration of unstructured content as the final search space. In other
words, we have explored the use of semantic information to support more
expressive queries and more accurate results, while the retrieval problem is
formulated in a way that is proper of the IR field, thus drawing benefit from
the state of the art in this area, and enabling more realistic and applicable
approaches.