CHAPTER ONE
INTRODUCTION
Data mining is described as the
extraction of hidden helpful information from a collection of huge databases,
data mining is also a technique that encompasses an enormous form of applied
mathematics and computational techniques like link analysis, clustering,
classification, summarizing knowledge , regression analysis and so on. data
mining tools predict future trends and behaviors, permitting businesses to
create knowledge-driven selections. The machine-driven, prospective analyses
offered by data mining move on the far side the analyses of past events. Data
mining tools provides answer to business questions that were time
consuming. They search databases for hidden patterns, finding useful
information that is beyond the reach of specialists. Data mining
techniques is enforced speedily on existing package and hardware platforms to
reinforce the worth of existing information resources, and might be integrated
with new product and systems as they're brought. once enforced on high
performance client/server or multiprocessing computers, data mining tools will
analyze huge databases to provide answers to questions such as, ”What goods
consumers tend to buy the most and goods that go along side with it”.
Coenen(2010) in his publication” Data Mining: Past, Present and Future”
discussed the history of data mining can be dated as far back as late 80s
when the term began to be used, at least within the research community
and differentiated it from sql. Broadly data mining can be defined as as
set of mechanisms and techniques, realized in software, to extract hidden
information from data. However, the word hidden in this definition is
important; By the early 1990s data mining was commonly recognized as a
sub process within a larger process called Knowledge Discovery in Databases or
KDD , the most commonly used definition of KDD is that of Fayyad et al as
“the nontrivial process of identifying valid, novel, potentially useful and
ultimately understandable patterns in data.’’ (Fayyad et al. 1996).
As such data mining should be
viewed as the sub-process, within the overall KDD process, concerned with the
discovery of hidden information". Other sub-processes that form part of
the KDD process are data preparation (warehousing, data cleaning,
pre-processing, and so on) and the analysis/visualization of results. For may
practical purposes KDD and data mining are seen as synonymous, but technically
one is a sub-process of the other. The data that data mining techniques were
originally directed at was tabular data and, given the processing power
available at the time, computational efficiency was of significant concern. As
the amount of processing power generally available increased, processing became
less of a concern and was replaced with a desire for accuracy and a desire to
mine ever larger data collections. Today, in the context of tabular data, we
have a well established range of data mining techniques available.
It is well within the
capabilities of many commercial enterprises and researchers to mine tabular
data, using software such
as Weka, on standard desktop machines. However, the amount of electronic
data collected by all kinds of institutions and commercial enterprises, year on
year, continues to grow and thus there is still a need for effective mechanisms
to mine ever larger data sets. The popularity of data mining increased
significantly in the 1990s, notably with the establishment of a number of
dedicated conferences; the ACM SIGKDD(special interest group on knowledge
discovery in data) annual conference in 1995, and the European PKDD(practice of
knowledge discovery in databases) and the Pacific/Asia PAKDD(pacific asia
conference on knowledge discovery and data mining) conferences This increase in
popularity can be attributed to advances in technology; the computer processing
power and data storage capabilities available meant that the processing of
large volumes of data using desktop machines was a realistic possibility. It
became common place for commercial enterprises to maintain data in computer
readable form, in most cases this was primarily to support commercial
activities, the idea that this data could be mined often came second. The 1990s
also saw the introduction of customer loyalty cards that allowed
enterprises to record customer purchases, the resulting data could then be
mined to identify customer purchasing patterns. Data mining , is the method of
looking into giant volumes of data for patterns using methods like
classification, association rule mining, clustering, etc.. data mining is a
topic that is related to topics like machine learning and pattern
recognition. data mining techniques area unit the results of an extended
process of analysis and products development.
I am in my final year. I was bright
and brilliant, my family was optimistic in me; they thought so much of
me, but I had a fault. What was my fault? I hated compiler construction.
I struggled with calculations all my life. Though i have been lucky; I
did well all the same. However, I had to write my final exam. I searched for all
Compiler construction past question for each year, compared, and sorted them.
Guess what I discovered! Over 35% of the questions were repetitions. I had hit
the jackpot. I carefully and thoroughly checked through the answer page.
Therefore, I kept on revising only the repeated questions. Well, I have a good
grade to show for the Data Mining I performed.
There is huge amount of data
available in Information Industry. This data is of no use until converted into
useful information. Analyzing this huge amount of data and extracting useful
information from it is necessary. The extraction of information is not the only
process we need to perform; it also involves other processes such as Data
pre-processing( Data Cleaning, Data Integration, Data Transformation) Data
Mining, Pattern Evaluation and Data Presentation. Once all these processes are
over, we are now position to use this information in many applications such as
Fraud Detection, Market Analysis, Production Control, Science Exploration etc.
1.2. PROBLEM STATEMENT
Through in depth research and
observations carried on supermarket we have discorvered that retailers are
willing to know what product is purchased with the other or if a particular
products are purchased together as a group of items . Which can help in their
decision making with respect to placement of product , determining
the timing and extent of promotions on product and also have a better
understanding of customer purchasing habits by grouping customers with
their transactions.
This project is aimed at
designing and implementing a well-structured market basket analysis software
tool to solve the problem stated above and compare the result to that of an
existing software called WEKA.
1.3. AIM AND OBJECTIVE OF THE STUDY
The aim of the study is to
maximize profit for the retailers by providing better services to the
consumers
The objective of this study are:
Cross-Market Analysis - Data Mining
performs Association/correlations between product sales.
Identifying Customer Requirements -
helps in identifying the best products for different customers. It uses
prediction to find the factors that may attract new customers.
1.4. METHODOLOGY
Data Pre-Processing
Due to the fact that the data we are
getting is a raw data, raw data in the real world may be incomplete it
has to be pre-processed the raw data has to go through data cleaning, data
integration, data normarlization,data reduction because without a quality data
there will be no quality mining results.
data cleaning: This has to do with
filling of missing values, resolving of inconsistencies in the raw data.
data integration: combining data from
multiple sources and generating the user with unified view of the data
normarlization: normalization is used
to minimize or to reduce redundancy.
data reduction: reduction of the data
set that is much smaller in volume but yet yields the same analytical results
1.5. SCOPE OF THE STUDY
This scope of the study focuses
on Babcock Ventures supermarket and the scope of this project includes:
We aim to develop our very own market
basket analysis software, which will be used in babcock university
The software will exhibit a colorful
GUI(graphical user interface).
The software will be based on
Apriori .
We intend to conduct a research into
the various branches of science that this software will be based on, such as
artificial intelligence.
We will develop a software that will
eventually stand out among other data mining software.
1.6. LIMITATION OF THE STUDY
The limitations of this software
will include:
Data restrictions: this is a major
factor that stands in the way of the execution of this project. Since there is
no data on households and individual consumers ,we neglect such purchases.
Time constraints: this is also a major
factor due to the fact that it can’t work on a small amount of raw data because
it tends to mislead the retailer in a nut shell this software will work on
large volumes of data.
TOPIC: AUTOMATED MARKET BASKET ANALYSIS SYSTEM
Chapters: 1 - 5
Delivery: Email
Delivery: Email
Number of Pages: 67
Price: 3000 NGN
In Stock

No comments:
Post a Comment
Add Comment