Professional Documents
Culture Documents
DM Ya
DM Ya
DM Ya
Sales forecasting in retail Industry is a significantly complex problem in today’s global market
scenario. However at the management level, forecasts of sales is essential to all decision
activities in various functional areas of a retail industry company such as marketing, sales, and
production/purchasing, as well as finance and accounting. It provides the basis for regional and
national distribution and stock replenishment plans. In this research study, we focus on exploring
the concept of soft computing and data mining techniques to develop a prototype solution based
Knowledge Discovery
Today every company uses IT technology to store their entire business operations transaction
data. This enormous expansion of collected data from different data sources and fields, can be
accessed and analyzed to extract valuable knowledge that can be helpful in the decision making
process of the business. This process is called Knowledge discovery, a pattern search
methodology that searches exact patterns in the database using different algorithms and methods.
The proposed prototype knowledge discovery consists of a number of interactive and iterative
steps with many decisions and queries introduced by the user. Below is the description of these
steps:
1) Goals of the Application Domain: The first step of the knowledge discovery process begins
by understanding the goals of the application domain and the goals of the data mining process.
The main goal being knowledge discovery i.e., previously unknown patterns that are useful and
effective in the decision making process from extracted patterns that describe current and past
heterogeneous databases and data warehouses and integrated into a research database. The
relevant data for the analysis process is targeted and retrieved from this data source. Further data
characterization is performed by summarizing the general characteristics of the target class of data set.
3) Data Cleaning and Clustering: Using various algorithms the database is cleaned by erasing
errors, ensuring consistency, removing redundancy of data and transforming the selected
data set to format appropriate for the prototype data mining procedure. Clustering is performed
4) Data Mining Process: Data mining is performed on the selected data sets to extract
interesting patterns by using the appropriate data mining algorithms and methods.
An association rule is an expression in the form X ⇒ Y, where X and Y are sets of elements
with no common elements between them. In a given a data source of transactions D where each
then there is a probability that it contains Y too. The rule X ⇒ Y states that in the transactions set
T with confidence c if c% of transactions in T that contains X contains Y too. The rule states that
support s in T if s% of the transactions in T contains both X and Y. Association rule based data
mining locates all association rules that are greater than or equal to a user-specified minimum
support (minsup), and minimum confidence (minconf). The data mining process for extracting
valuable association rules consists of 1) Discovery of all item sets that satisfy minsup (known as
Frequent-Itemset generation) and 2) Generating all association rules that satisfy minconf using
itemsets generated by the first step. To perform these steps Association rule mining algorithms
employ either Breadth-first search approach (BFS) or Depth-first search approach (DFS)
approach. In our prototype we use BFS approach to determine the support values of all (k −1)-
item sets before calculating the support values of the k-item sets where k is a positive integer.
DFS approach the algorithm data are represented in a tree structure and can start from, say, node
a in the tree and counts its support to determine whether it is frequent. If true, the algorithm
expands to the next level of nodes until an infrequent node is reached. It then backtracks its
5) Classification or supervised learning is the process of finding a set of models (functions) that
describe data classes where the models derived based on a set of training data
techniques.
Event Product
Event ID Pattern Product ID
Event Description Pattern ID Product Descripton
Pattern Description
Branch
Branch ID
Fact Table
Event Hierarchy Branch Description
Event ID
Event ID Product ID
Event Level Pattern Description
Event Condition
Time granularity
Event Pattern Year
Event ID Quarter
Pattern ID Month
Week
Day