Latest

whatsapp (+234)07060722008
email sales@graciousnaija.com

Tuesday 12 September 2017

INTEGRATING DATABASES AND WORDNET FOR SEMANTIC SEARCH

MSC Project Topics in Computer Science

ABSTRACT 
Semantic integration of heterogeneous databases requires that user queries are understood semantically and can be related to the real contents of underlying databases. The state of art on data integration is focusing on using ontologies which is described as a formal and explicit conceptualization, as a semantic foundation for data integration. Unlike the previous research that considered the databases to have a well structured ontologies, this thesis employs WordNet lexical database to mine for synonyms from WordNet synsets into cluster. A query reformulation algorithm is proposed to reformulate the user query based on the retrieved synonyms and then lookup for all the queries in the database(s). The implementation turned out to be effective in practical applications, with a particular set of data, as the system is able to identify terms correctly based on synset, while previous methods relying only on available terms in the WordNet, this thesis can handle an unknown term by the WordNet by inserting new term into the existing synset in the WordNet database. Though, the evaluation results show that the proposed approach improves recall, but the system incurs high execution time as the number of table increases. 


CHAPTER ONE
GENERAL INTRODUCTION
1.1 INTRODUCTION
The technological advancement has lead to tremendous growth of data in the different databases and information sources available on the web. The web is the enormous sources of information ever conceived by man. There are different business data in a particular domain hosted on the web that can help users or organizations to make decisions in case they need to make choices. Hence, it is necessary and crucial that this information source is complete, precise and can be acquired on time (Abdul-Kareem and Hajmoosaei, 2007). However, each of the information sources on the web is designed and use differently according to the requirements of the organization. This independency nature of individual databases causes heterogeneity in the databases. As long as the heterogeneity exists, access to all information available remains limited since information is stored separately without easy means to integrate them from different sources. The information society, however, requires a complete and easy access to all available information. These necessitate the need for effective methods to resolve heterogeneity and combine data from different data sources. To retrieve, access and use information from the different data sources give rise to the data to be integrated because of heterogeneity within and among the various data sources (databases), opined (Kashyap and Sheth, 2006). Data integration system is aimed at providing users with easy and efficient access to information from different data sources through common interface. Integrating data from distributed, heterogeneous, autonomous data sources is a fundamental requirement in many databases application domains (Amshakala and Nedunchezhian, 2011). A data integrated system should allow users to concentrate on what information is needed without the need to offer detail information on how to retrieved the require information. In recent years, there has been considerable interest in integrated database, a system which attempt to logically integrate a number of independent distributed databases while allowing the local databases to maintain complete control of their operations. Thus, an integrated system is a distributed database system in which each site maintains complete autonomy. Database integration has been recognized as the solution to disparate data sources. It deals with the data transparency problem of distributed systems. That means it has to make users think that they are accessing a single information system with homogeneous data structures. But actually the data is physically distributed over heterogeneous data sources. An important requirement to achieve an integrated database is to be able to identify semantically equivalent data items in component databases. Semantic heterogeneity is the major source of problems in integrated database design. These conflicts result from the use of different structures for the same information and the use of different names for the same entity in the same domain. Ontology have been suggested as a cornerstone to solve the problem of semantic heterogeneity in database integration. The basis is that an ontology can offers a shared common understanding of the application domain. It provide the meaning of terms and their relationships that facilitates communication by providing precise notions that can be used to compose messages (queries, statements) about the domain ( Li and Chow, 2009). Previous research regarding the use of ontology to address the semantic problems was limited to well defined, well structured and domain specific ontology (Guarino and Giaretta, 1995;Gulati and Sharma, 2010; Ghawi and Cullot, 2009). In addition, most of these works are mere proposal, hand-coded and not implemented. Hand-coded ontology has a low coverage of named entities, domain specific, and automatic building ontology is time-consuming and daunting task. Making uses of existing ontology and augmenting it with new term(s) could be of great benefit and time saving. In this direction, this research proposes the use of WordNet (existing ontology) to serve as a controlled vocabulary and semantic similarity comparison mechanism between words. If semantic relation like synonyms could be incorporated in the search, then semantically related terms may be retrieved and could be presented to the user, even if it is not specifically present in the query. By replacing the query terms with a mechanism which can identify semantically similar terms, challenge posed by the use of different terms to define the same concept may be handled.

1. 2 Background of the study
Data integration is a research discipline that studies the mechanisms for a seamless access to autonomous and heterogeneous information sources (Amshakala and Nedunchezhian, 2011). Data integration and information exchangeable systems request a uniform interface that allows transfer and process that share data across multiple distributed systems. Current data integration systems are built upon the traditional database systems in which data are modeled in one of traditional data models. Because each data source is modeled in different data models and available in different form and formats, integration problems have emerged. These different data models have result in to number of databases, which in turn give birth to incompatibility among themselves. This has led to the heterogeneity of the existing database systems. There are different types of heterogeneity such as schematic, semantic and syntactic existing among the data sources that are to be integrated. Schematic emanated from different structures of sources schema, semantic occurs when different sources contain the same concept with different name and heterogeneity resulted from different language used for modeling the different sources. These heterogeneity issues necessitate data integration in order to enhance better decision making and function of associated information system. Thus, there is need to create a frame work that reconcile different databases system. Data integration has been recognized as the solution to disparate data sources. It deals with the data transparency problem of distributed systems. That means it has to make users think that they are accessing a single information system with homogeneous data structures. But actually the data may be heterogeneous, autonomous and geographically distributed sources.

The process of heterogeneous database integration may be defined as the creation of a single uniform query interface to data that are collected and stored in multiple, heterogeneous databases, Raji (2010). The interoperability between databases should be supported without modifying the database or losing their autonomy, and in a way that is relatively transparent to user and the application. The state of the art on data integration is focusing on using ontologies, which is a formal and explicit specification of a shared conceptualization, Gruber (2009), to provide semantic support for data integration. Ontologies have been proven to be useful to capture the semantic content of data sources and to unify the semantic relationships between heterogeneous structures (Necib and Freytag, 2005). With this vision, this thesis proposes a data integration system, using an ontology approach. To this end, the research attempt to use WordNet (an existing ontology) and adapt it with associated databases to overcome semantic heterogeneity with respect to ontology approach. The main focus is on the part of semantics related to the meaning of the terms used as identifiers in the various databases.

MSC Project Topics in Computer Science

INTEGRATING DATABASES AND WORDNET FOR SEMANTIC SEARCH

Department: Computer Science (M.Sc)
Format: MS Word
Chapters: 1 - 5, Preliminary Pages, Abstract, References, Appendix.
Delivery: Email
No. of Pages: 81

NB: The Complete Thesis is well written and ready to use. 

Price: 20,000 NGN
In Stock
Buy Now


No comments:

Post a Comment

Add Comment