MSC Project Topics in Computer Science
ABSTRACT
Semantic integration of heterogeneous databases requires that user queries are understood
semantically and can be related to the real contents of underlying databases. The state of art on
data integration is focusing on using ontologies which is described as a formal and explicit
conceptualization, as a semantic foundation for data integration. Unlike the previous research that
considered the databases to have a well structured ontologies, this thesis employs WordNet
lexical database to mine for synonyms from WordNet synsets into cluster. A query reformulation
algorithm is proposed to reformulate the user query based on the retrieved synonyms and then
lookup for all the queries in the database(s). The implementation turned out to be effective in
practical applications, with a particular set of data, as the system is able to identify terms correctly
based on synset, while previous methods relying only on available terms in the WordNet, this
thesis can handle an unknown term by the WordNet by inserting new term into the existing synset
in the WordNet database. Though, the evaluation results show that the proposed approach
improves recall, but the system incurs high execution time as the number of table increases.
CHAPTER ONE
GENERAL INTRODUCTION
1.1 INTRODUCTION
The
technological advancement has lead to tremendous growth of data in the
different databases and information sources available on the web. The web is
the enormous sources of information ever conceived by man. There are different
business data in a particular domain hosted on the web that can help users or
organizations to make decisions in case they need to make choices. Hence, it is
necessary and crucial that this information source is complete, precise and can
be acquired on time (Abdul-Kareem and Hajmoosaei, 2007). However, each of the
information sources on the web is designed and use differently according to the
requirements of the organization. This independency nature of individual
databases causes heterogeneity in the databases. As long as the heterogeneity
exists, access to all information available remains limited since information
is stored separately without easy means to integrate them from different
sources. The information society, however, requires a complete and easy access
to all available information. These necessitate the need for effective methods
to resolve heterogeneity and combine data from different data sources. To
retrieve, access and use information from the different data sources give rise
to the data to be integrated because of heterogeneity within and among the
various data sources (databases), opined (Kashyap and Sheth, 2006). Data
integration system is aimed at providing users with easy and efficient access
to information from different data sources through common interface.
Integrating data from distributed, heterogeneous, autonomous data sources is a
fundamental requirement in many databases application domains (Amshakala and
Nedunchezhian, 2011). A data integrated system should allow users to
concentrate on what information is needed without the need to offer detail
information on how to retrieved the require information. In recent years, there
has been considerable interest in integrated database, a system which attempt
to logically integrate a number of independent distributed databases while
allowing the local databases to maintain complete control of their operations.
Thus, an integrated system is a distributed database system in which each site
maintains complete autonomy. Database integration has been recognized as the
solution to disparate data sources. It deals with the data transparency problem
of distributed systems. That means it has to make users think that they are
accessing a single information system with homogeneous data structures. But
actually the data is physically distributed over heterogeneous data sources. An
important requirement to achieve an integrated database is to be able to
identify semantically equivalent data items in component databases. Semantic
heterogeneity is the major source of problems in integrated database design.
These conflicts result from the use of different structures for the same
information and the use of different names for the same entity in the same
domain. Ontology have been suggested as a cornerstone to solve the problem of
semantic heterogeneity in database integration. The basis is that an ontology
can offers a shared common understanding of the application domain. It provide
the meaning of terms and their relationships that facilitates communication by
providing precise notions that can be used to compose messages (queries,
statements) about the domain ( Li and Chow, 2009). Previous research regarding
the use of ontology to address the semantic problems was limited to well
defined, well structured and domain specific ontology (Guarino and Giaretta,
1995;Gulati and Sharma, 2010; Ghawi and Cullot, 2009). In addition, most of
these works are mere proposal, hand-coded and not implemented. Hand-coded
ontology has a low coverage of named entities, domain specific, and automatic
building ontology is time-consuming and daunting task. Making uses of existing
ontology and augmenting it with new term(s) could be of great benefit and time
saving. In this direction, this research proposes the use of WordNet (existing
ontology) to serve as a controlled vocabulary and semantic similarity
comparison mechanism between words. If semantic relation like synonyms could be
incorporated in the search, then semantically related terms may be retrieved
and could be presented to the user, even if it is not specifically present in
the query. By replacing the query terms with a mechanism which can identify
semantically similar terms, challenge posed by the use of different terms to
define the same concept may be handled.
1. 2 Background of the
study
Data
integration is a research discipline that studies the mechanisms for a seamless
access to autonomous and heterogeneous information sources (Amshakala and
Nedunchezhian, 2011). Data integration and information exchangeable systems
request a uniform interface that allows transfer and process that share data
across multiple distributed systems. Current data integration systems are built
upon the traditional database systems in which data are modeled in one of
traditional data models. Because each data source is modeled in different data
models and available in different form and formats, integration problems have
emerged. These different data models have result in to number of databases,
which in turn give birth to incompatibility among themselves. This has led to
the heterogeneity of the existing database systems. There are different types
of heterogeneity such as schematic, semantic and syntactic existing among the
data sources that are to be integrated. Schematic emanated from different
structures of sources schema, semantic occurs when different sources contain
the same concept with different name and heterogeneity resulted from different
language used for modeling the different sources. These heterogeneity issues
necessitate data integration in order to enhance better decision making and
function of associated information system. Thus, there is need to create a
frame work that reconcile different databases system. Data integration has been
recognized as the solution to disparate data sources. It deals with the data
transparency problem of distributed systems. That means it has to make users
think that they are accessing a single information system with homogeneous data
structures. But actually the data may be heterogeneous, autonomous and
geographically distributed sources.
The
process of heterogeneous database integration may be defined as the creation of
a single uniform query interface to data that are collected and stored in
multiple, heterogeneous databases, Raji (2010). The interoperability between
databases should be supported without modifying the database or losing their
autonomy, and in a way that is relatively transparent to user and the
application. The state of the art on data integration is focusing on using
ontologies, which is a formal and explicit specification of a shared
conceptualization, Gruber (2009), to provide semantic support for data
integration. Ontologies have been proven to be useful to capture the semantic
content of data sources and to unify the semantic relationships between
heterogeneous structures (Necib and Freytag, 2005). With this vision, this
thesis proposes a data integration system, using an ontology approach. To this
end, the research attempt to use WordNet (an existing ontology) and adapt it
with associated databases to overcome semantic heterogeneity with respect to
ontology approach. The main focus is on the part of semantics related to the
meaning of the terms used as identifiers in the various databases.
MSC Project Topics in Computer Science
INTEGRATING DATABASES AND WORDNET FOR SEMANTIC SEARCH
Department: Computer Science (M.Sc)
Format: MS Word
Chapters: 1 - 5, Preliminary Pages, Abstract, References, Appendix.
Delivery: Email
Delivery: Email
No. of Pages: 81
NB: The Complete Thesis is well written and ready to use.
NB: The Complete Thesis is well written and ready to use.
Price: 20,000 NGN
No comments:
Post a Comment
Add Comment