Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 40

Table of Contents

Market basket analysis in a multiple Graph Based Structure of Market Basket


store environment Analysis
• Introduction • Market Basket
• Defining the problem
• Need for a new Algorithm • Apriori Algorithm
• Problem Definition • Limitation
• Algorithm Description • Similis Algorithm
• Problem Transformation
• Searching for the Maximum-
weighted Clique
• Comparison
• Example
• Conclusion
Introduction
• Market basket is a method of discovering customer
purchasing patterns
• Discovering such purchasing patterns can help
managers in designing store layout, web sites, product
mix and bundling, and other marketing strategies
• Company with multiple stores, discovery of purchasing
patterns that may vary over time and exist in all, or in
subsets of, stores can be useful in forming marketing,
sales, service, and operation strategies at the
company’s, local, and store levels
Need for a new Algorithm
Two main problems in using the existing methods in a multi-
store environment:
1.Temporal Association Rules :
1. The static association rules either find patterns at a point of
time or implicitly assume the patterns stay the same over time
and across stores
2. Temporal Association selling periods are considered in
computing the support value
2.Spatial Association Rules
1. Possibility that some products may not be sold in some
stores, for example, because of geographical, environmental,
or political reasons
2. Problem is to find common association patterns in subsets of
stores with location
Cont..
• Solution: Apriori-like algorithm
– Covers rules that are applicable to the entire chain
without time restriction or to a subset of stores in
specific time intervals
– The format of the rules is similar to that of the
traditional rules
– Rules also contain information on store (location)
and time
Cont..
• Examples:
– In the second week of August, customers
purchase computers, printers, Internet and
wireless phone services jointly in electronics
stores near campus
– In January, customers purchase cold medicine,
humidifiers, coffee, and sunglasses together in
supermarkets near skiing resorts
Problem Definition
Cont..
– Let {T1, T2,. . ., Tm} be the set of mutually disjoint
time intervals (periods) and form a complete
partition of T
– Let P={P1, P2,. . ., Pq} be the set of stores, where Pj
(1 ≤ j ≤ q) denotes the jth store in the store chain
– Each transaction s in D is attached with a
timestamp, ‘t’ and store identifier, ‘p’ to indicate
the store and time that the transaction occurs
– Let Sk subset P and Rk subset T be the sets of the
stores and times that item Ik is sold, respectively
Cont..
Cont..
Cont..
Cont..
“Apriori-like” Algorithm
Cont..
Cont..
• Some essential Points:
– RFk denote the set of all relative-frequent k-
itemsets; Fk the set of all frequent k-itemsets; Ck
the set of candidate k-itemsets
– k-item candidate itemset are generated by
combining k-1 frequent itemsets following the
anti-monotone property
Cont..
• Algorithm in brief:
– First step of the algorithm is to build the PT table, for
each item in I
– Different Phases of Algorithm:
• In the first phase, we scan the database for the first time and
build a two-dimensional table, called the TS table and find
frequent 1-itemset
• In the kth phase of the algorithm, Ck is derived, and Fk is
generated by evaluating their supports
• Since an RFk itemset must be a frequent itemset, we
generate RFk from Fk by evaluating the relative supports of
the itemsets X in Fk
Cont..
• PT table: Associates context (stores, time
intervals) with each item in I
– PT tables for individual items can be used to
determine the PT table for a given itemset X
Cont..
Cont..
• PT table: The method to compute the jth row
of PT table for itemset X
Cont..
Cont..
• Candidate itemsets: we generate the
candidate itemsets from the frequent
itemsets, from the last phase
• Relative-frequent itemset:
– Because an RF itemset must be a frequent itemset,
we can generate RFk from Fk by computing the
relative supports of those itemsets X in F k
– |DVX| can be obtained from the TS and PT tables of
X
Cont..
Conclusion
• Store-chain association rules, is proposed specifically for a
multi-store environment, where stores may have different
product-mix strategies that can be adjusted over time.
• These rules have a distinct advantage over the traditional
ones because they contain store (location) and time
information so that they can be used not only for general or
local marketing strategies (depending on the results), but also
for product procurement, inventory, and distribution
strategies for the entire store chain
References
1. R. Agrawal, R. Srikant, Fast algorithms for mining association rules,
Proceedings of the 20th VLDB Conference, Santiago, Chile, 1994, pp.
478–499
2. J.M. Ale, G.H. Rossi, An approach to discovering temporal association
rules, Proceedings of the 2000 ACM Symposium on Applied Computing
(Vol. 1), Villa Olmo, Como, Italy, 2000, pp. 294– 300
3. S. Brin, R. Motwani, J.D. Ullman, S. Tsur, Dynamic itemset counting and
implication rules for market basket data, Proceedings of the 1997 ACM-
SIGMOD Conference on Management of Data, Tucson, Arizona, USA,
May 1997, pp. 255–264.
4. E. Clementini, P.D. Felice, K. Koperski, Mining multiplelevel spatial
association rules for objects with a broad boundary, Data and
Knowledge Engineering 34 (3) (2000) 251– 270.
Table of Contents
Market basket analysis in a multiple Graph Based Structure of Market Basket
store environment Analysis
• Introduction • Market Basket
• Defining the problem
• Need for a new Algorithm
• Apriori Algorithm
• Problem Definition • Limitation
• Algorithm Description • Similis Algorithm
• Problem Transformation
• Searching for the Maximum-
weighted Clique
• Comparison
• Example
• Conclusion
Market Basket
• Market Basket is a powerful tool for implementing the
cross selling strategies.
• Problem Definition:
– The file with multiple transactions can be shown in a
relational database table T(customer, item).
– The customer= {1,2,3,……,n} and the item = a,b,c,….,z.
– The table T(customer, item) can be seen as a set of all
customer transactions Trans = {t1,t2,......,tk} where each
transaction contains the subset of items tk = {ia, ib,......,iz}
– The relational table thus formed can be seen as the
relationship between item and customer called item –
clientele.
Apriori Algorithm
• Apriori Algorithm has two important
characteristics.
• Level – wise algorithm i.e. it traverses the item lattice one
level at a time, from frequent 1 – item set to maximum size
of frequent item sets.
• Generate and Test strategy for finding the frequent item
sets. The support for each candidate is then counted and
tested against the minsup threshold.
• Limitation
• Apriori algorithm has an exponential time complexity, and
several passes over the input table are needed. To overcome
these handicaps Similis algorithm is proposed.
Similis Algorithm
• Similis is a Latin word which means “Similar”.
• Similis consists of two steps :
• Problem Transformation: Generation of graph structure
• Search: Finding the maximum weight clique
• Algorithm Description:
STEP 1 - Data Transformation
input: table T(customer, item)
2 Generate graph G(V,E) using the similarities between items
output: weighted graph G(V,E)
STEP 2 - Finding the maximum-weighted cliques
input: weighted graph G(V,E) and size k
2 Find in G(V,E) the clique S with k vertexes with the maximum
weight, using the Primal- Tabu Meta-heuristic.
output: weighted clique S of size k that correspond to the most
frequent market basket with
k items.
Problem Transformation
• As stated earlier of transforming the table T(customer,
item) into condense data by using graph structure.
• A graph is a pair of G = (V,E), where V is the vertices
and E is the edge to the graph.
• In market basket case each vertex corresponds to an
item and each arc has a weight which represents the
distance between the adjacent vertices.
• The distance between two items is given by the
frequency that the two items are bought together.
Cont…
• To find the values for the weighted graph G(V,E) some
similarity measures can be used.
• The similarity value of the two items will be high if
they are both included in frequent transactions.
• This means that if two items are frequently bought in
the same transactions, then they belong to a frequent
market basket.
• In order to create sets of items, one association
measure must be found, similarity or distance
measures can be created.
Cont…
• For each pair of items (A,B) a similarity measure
SIM(A,B) can be found, if the items are bought
together many times they have a strong similarity,
but they have a weak similarity if they are not usually
bought together.
• For all items, an item similarity matrix is generated,
which can be represented by the adjacent matrix of
the weighted graph G(V,E).
Similarity Measures
• The authors describe the following similarity measures.
• These measures use binary matrices and return normalized values
between 0 and 1.
• The Dice (sim1), Jaccard (sim2) and Cosine (sim3) coefficient are widely
used given their simplicity.
Weight Calculation
• A multiplicative model will de used to express the weight of an edge (A,B). The
weight of the edge (A,B) takes into account the similarity and frequency of items,
such as:

weight(A,B) = sim(A,B) . frequency(A,B)


• Where the similarity value of two items will be high if they are both included in the
same transactions.
• The frequency of the item must be considered to guarantee a correspondence
between high-weighted edges and items that appear in many transactions.
• There are several ways to define item frequency. In this work author opt for the
average of the relative frequency of the two items, given by:
Example

In this table T customer and item relationship is given, there are 5 customer and
6 items so we have to find the item – clientele relationship means:
Bread = (1,2,4,5) Milk = (1,3,4,5) Diaper = (2,3,4,5) Beer = (2,3,4)
Eggs = (2) Coke = (3,5)
Matrix of Graph
G(V,E) Bread Milk Diaper Beer Eggs Coke

Bread 0.48 0.60 0.28 0.125 0.12

Milk 0.48 0.28 0 0.30

Diaper 0.525 0.125 0.30

Beer 0.133 0.125

Eggs 0

Coke

Adjacent Matrix of weighted graph G =(V,E)


Searching for the Maximum-weighted
Clique
• A clique can represent a common interest group.
• Given a graph representing the communication among a
group of individuals in an organization, each vertex
represents an individual, while edge (i, j) shows that
individual i regularly communicates with individual j.
• Our aim is to find the maximum weighted clique in graph.
• If a graph with weights in the edges is used, the most
weighted clique corresponds to the common-interest group
whose elements communicate the most among themselves.
This structure allows the representation of sets of elements
strongly connected.
Maximum Clique Problem
• The Maximum Clique Problem is an important problem in combinatorial
optimization.
• In market basket it is used to find the interesting combination patterns of
the item with another one’s.
• Here to solve this problem, the Primal-Tabu algorithm is used for finding
the maximum weighted clique.
• Conceptually primal Tabu works on finding the related neighbourhood
structures are N+, N-, and N0 for addition, removal and swap of a vertex
of the graph.
• At each step one new solution S' is chosen from the neighbourhood N(S)
of the current solution S.
• At each iteration the best solution found S* are updated whenever the
clique value is increased.
Comparison
• Apriori with support 3 give us the frequent item choice (Bread, Milk, Diaper) .
• Here I am trying to find whether the graph give me the same option by using the
maximum weight clique method.
• But before this since the support is 3 and our values are in 1 and 0 (binary matrices
) so we should need to normalized the 3 to the range of 1 and 0 .
Result
Bread
0.48

Eggs
Milk

0.60
0.48
Coke

Only those edges


are consider
Diaper
whose weights
Beer 0.525 are more than
0.40.
Conclusion
• The main disadvantage of the Apriori algorithm is the
exponential time complexity, since it performs many passes
over the data.
• Using few items or sparse data the algorithm is efficient ,
while when using correlated data the performance degrades
significantly.
• The Similis algorithm because of its lower computational
complexity, thus allowing the resolution of a greater number
of real problems.
• In this innovative approach, the condensed data is obtained
by transforming the market basket problem in a maximum-
weighted clique problem.
References
1. E. Balas, W. Niehaus, Optimized Crossover-Based Genetic
Algorithms will be the Maximum Cardinality and Maximum
Weight Clique Problems, Journal of Heuristics, Kluwer Academic
Publishers, 4, 1998, pp. 107-122.
2. M. Berry and G. Lino, Data Mining Techniques for Marketing, Sales
and Customer Support, John Wiley and Sons, 1997.
3. I.M. Bomze, M. Budinich, P.M. Pardalos and M. Pelillo, Maximum
Clique Problem, in Handbook of Combinatorial Optimization, D.-Z.
Du and P.M. Pardalos Eds, 1999, pp.1-74.
4. J. Han, M. Kamber, Data Mining, Morgan Kaufmann, San
Francisco, 2001.

You might also like