Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Aim: Performing data pre-processing tasks and Demonstrate performing association rule mining on data

sets.
Procedure:
1. Open a relation in weka using the following options:
Open file → weather.arff relation
2. Select edit
3. By clicking on the data delete few values from the relation
4. To replace these values automatically follow the options:
Choose → filters → unsupervised → attribute → ReplaceMissingValue
5. To replace these values manually follow the options:
Choose → filters → unsupervised → attribute → ReplaceMissingWithUserConstant

Original Dataset:
Edited dataset:

Replacing missing values:

Replacing missing values with user constants:


Demonstrating Apriori algorithm by discretizing the data:

Procedure:
1. Open a relation in weka using the following options:
Open file → weather.arff relation

2. To discretize the data follow the options:


Choose → filters → unsupervised → attribute → Discretize
3. After discretization to apply apriori algorithm follow the steps:
Associate → choose → apriori → start
Apriori Analysis:
=== Run information ===

Scheme: weka.associations.Apriori -N 10 -T 0 -C 0.9 -D 0.05 -U 1.0 -M 0.1 -S -1.0 -c -1


Relation: weather1-weka.filters.unsupervised.attribute.Discretize-B10-M-1.0-Rfirst-last-precision6
Instances: 14
Attributes: 5
outlook
temperature
humidity
windy
play
=== Associator model (full training set) ===

Apriori
=======

Minimum support: 0.15 (2 instances)


Minimum metric <confidence>: 0.9
Number of cycles performed: 17

Generated sets of large itemsets:

Size of set of large itemsets L(1): 17

Size of set of large itemsets L(2): 34


Size of set of large itemsets L(3): 13

Size of set of large itemsets L(4): 1

Best rules found:

1. outlook=overcast 4 ==> play=yes 4 <conf:(1)> lift:(1.56) lev:(0.1) [1] conv:(1.43)


2. humidity='(89.8-92.9]' 3 ==> windy=TRUE 3 <conf:(1)> lift:(2.33) lev:(0.12) [1] conv:(1.71)
3. outlook=rainy play=yes 3 ==> windy=FALSE 3 <conf:(1)> lift:(1.75) lev:(0.09) [1] conv:(1.29)
4. outlook=rainy windy=FALSE 3 ==> play=yes 3 <conf:(1)> lift:(1.56) lev:(0.08) [1] conv:(1.07)
5. humidity='(77.4-80.5]' 2 ==> outlook=rainy 2 <conf:(1)> lift:(2.8) lev:(0.09) [1] conv:(1.29)
6. temperature='(-inf-66.1]' 2 ==> windy=TRUE 2 <conf:(1)> lift:(2.33) lev:(0.08) [1] conv:(1.14)
7. temperature='(68.2-70.3]' 2 ==> windy=FALSE 2 <conf:(1)> lift:(1.75) lev:(0.06) [0] conv:(0.86)
8. temperature='(68.2-70.3]' 2 ==> play=yes 2 <conf:(1)> lift:(1.56) lev:(0.05) [0] conv:(0.71)
9. temperature='(74.5-76.6]' 2 ==> play=yes 2 <conf:(1)> lift:(1.56) lev:(0.05) [0] conv:(0.71)
10. humidity='(83.6-86.7]' 2 ==> temperature='(82.9-inf)' 2 <conf:(1)> lift:(7) lev:(0.12) [1] conv:(1.71)
Aim: Listing all the categorical (or nominal) attributes and the real-valued attributes separately.

Procedure:
1. Open the relation credit-g.arff
2. Select the attributes and list them into nominal and real-valued attributes

Nominal attributes:
Real-valued Attributes:
Aim: Generate simple rules in plain English using credit risk assessment for selected attributes.

Procedure:
1. Select the attributes
• Checking_status
• Credit_history
• Duration
• Credit_amount
• Savings_status
• Installment_commitment
• Other_payment_plans
• Existing_credits
2. Discretize the relation using the following steps:
Choose → filters → unsupervised → attributes → discritize
3. Right click on the box, click on show properties and change the following:
Binsize=5
UseEqualFrequency=True
4. Generate rules using apriori by following the steps:
Associate → choose → apriori → start

Discretized relation attributes:


Generated rules:
=== Run information ===

Scheme: weka.associations.Apriori -N 10 -T 0 -C 0.9 -D 0.05 -U 1.0 -M 0.1 -S -1.0 -c -1


Relation: german_credit-weka.filters.unsupervised.attribute.Discretize-F-B5-M-1.0-Rfirst-last-precision6
Instances: 1000
Attributes: 21
checking_status
duration
credit_history
purpose
credit_amount
savings_status
employment
installment_commitment
personal_status
other_parties
residence_since
property_magnitude
age
other_payment_plans
housing
existing_credits
job
num_dependents
own_telephone
foreign_worker
class
=== Associator model (full training set) ===

Apriori
=======

Minimum support: 0.7 (700 instances)


Minimum metric <confidence>: 0.9
Number of cycles performed: 6

Generated sets of large itemsets:

Size of set of large itemsets L(1): 6

Size of set of large itemsets L(2): 5

Size of set of large itemsets L(3): 2

Best rules found:

1. other_parties=none num_dependents='(-inf-1.5]' 767 ==> foreign_worker=yes 749 <conf:(0.98)>


lift:(1.01) lev:(0.01) [10] conv:(1.49)
2. other_parties=none 907 ==> foreign_worker=yes 880 <conf:(0.97)> lift:(1.01) lev:(0.01) [6]
conv:(1.2)
3. num_dependents='(-inf-1.5]' 845 ==> foreign_worker=yes 819 <conf:(0.97)> lift:(1.01) lev:(0.01) [5]
conv:(1.16)
4. other_parties=none other_payment_plans=none 742 ==> foreign_worker=yes 718 <conf:(0.97)>
lift:(1) lev:(0) [3] conv:(1.1)
5. other_payment_plans=none 814 ==> foreign_worker=yes 782 <conf:(0.96)> lift:(1) lev:(-0) [-1]
conv:(0.91)
6. other_payment_plans=none foreign_worker=yes 782 ==> other_parties=none 718 <conf:(0.92)>
lift:(1.01) lev:(0.01) [8] conv:(1.12)
7. num_dependents='(-inf-1.5]' foreign_worker=yes 819 ==> other_parties=none 749 <conf:(0.91)>
lift:(1.01) lev:(0.01) [6] conv:(1.07)
8. foreign_worker=yes 963 ==> other_parties=none 880 <conf:(0.91)> lift:(1.01) lev:(0.01) [6]
conv:(1.07)
9. other_payment_plans=none 814 ==> other_parties=none 742 <conf:(0.91)> lift:(1.01) lev:(0) [3]
conv:(1.04)
10. num_dependents='(-inf-1.5]' 845 ==> other_parties=none 767 <conf:(0.91)> lift:(1) lev:(0) [0]
conv:(0.99)

You might also like