Identifying Interesting Association Rules

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 20

Identifying Interesting

Association Rules with Genetic


Algorithms
Elnaz Delpisheh
York University
Department of Computer Science
and Engineering
November 17, 2016

Data mining
Too much
data

I = {i1,i2,...,in} is a set of items.


D = {t1,t2,...,tn} is a transactional
database.
ti is a nonempty subset of I.
An association rule is of the form
AB, where A and B are the
itemsets, A I, B I, and
AB= .
Apriori algorithm is mostly used
for association rule mining.
{milk, eggs}{bread}.
2

Data

Data
Mining

Association
rules

Apriori Algorithm
TID

List of item
IDs

T10 I1,I2,I3
0
T20 I2, I4
0
T30 I2, I3
0
T40 I1,I2,I4
0
T50 I1, I3
0
3

T60 I2, I3
0

Apriori Algorithm (Cont.)

Association rule mining


Too much
data

Too many
associati
on rules

Data

Data
Mining

Association
rules

Interestingness criteria
Comprehensibility.
Conciseness.
Diversity.
Generality.
Novelty.
Utility.
...

Interestingness measures
Subjective measures
Data and the users prior knowledge are

considered.
Comprehensibility, novelty, surprisingness, utility.
Objective measures
The structure of an association rule is considered.
Conciseness, diversity, generality, peculiarity.
Example: Support
It represents the generality of a rule.
It counts the number of transactions containing both A
and B.
7

Drawbacks of objective
measures
Detabase-dependence

Lack of knowledge about the database


Threshold dependence
Solution
Multiple database reanalysis

Problem
o Large number of disk I/O

Detabaseindependence

Genetic algorithm-based
learning (ARMGA )
1. Initialize population

2. Evaluate individuals in population


3. Repeat until a stopping criteria is met

Select individuals from the current


population
B. Recombine them to obtain more individuals
C. Evaluate new individuals
D. Replace some or all the individuals of the
current population by off-springs
A.

4. Return the best individual seen so far


9

ARMGA Modeling
Given an association rule XY
Requirement
Conf(XY) > Supp(Y)

Aim is to maximise

10

ARMGA Encoding
Michigan Strategy
Given an association k-rule XY, where

X,YI, I is a set of items I=i1,i2,..., in, and


XY=.
For example
{A1,...,Aj}{Aj+1,...,Ak}

11

ARMGA Encoding (Cont.)


The aforementioned encoding highly

depends on the length of the chromosome.


We use another type of encoding:
Given a set of items {A,B,C,D,E,F}
Association rule ACFB is encoded as follows
00A11B00C01D11E00F
00: Item is antecedent
11: Item is consequence
01/10: Item is absent

12

ARMGA Operators
Select
Crossover
Mutation

13

ARMGA Operators-Select
Select(c,ps): Acts as a filter of the

chromosome
C: Chromosome
Ps: pre-specified probability

14

ARMGA Operators-Crossover
This operation uses a two-point strategy

15

ARMGA Operators-Mutate

16

ARMGA Initialization

17

ARMGA Algorithm

18

Empirical studies and


Evaluation

Implement the entire procedure using

Visual C++
Use WEKA to produce interesting
association rules
Compare the results

19

20

You might also like