Hemant Bhagwan Sawale FMS-MBA-2019-21-006 Sec: A

HEMANT BHAGWAN SAWALE
FMS-MBA-2019-21-006
SEC: A
1. Construct Clustering using K-means Algorithm. i. Use suitable data set. / Weka library data (Iris data
set) ii. Use Weka software tool.
INFERENCE
The data used for clustering contains details about the Iris flower. The data consists of 5 attributes
namely: Sepal length: specifies the length of the sepal of an iris flower.
• Sepal width: specifies the sepal width of an iris flower.
• Petal length: specifies the petal length of an iris flower.
• Petal width: specifies the petal width of an iris flower.
• Class: specifies the different species of the genus Iris.
• The data set consists of 150 instances of varying attributes.
A sample of the data is as shown below in Figure: A (1)
Figure: A (1)
The pre-processing of the data set was done and the result is as follows Figure: A (2):-
Figure: A (2)
Weka is used to perform clustering on the given data set. K means algorithm is used to the clustering
process. It is one of the most commonly used algorithms for clustering. In Weka, the clustering tool that is
going to be used is ‘SimpleKMeans’. The clustered output is as follows [Figure A (3) and Figure A (4)].
Figure: A (3)
Figure: A (4)
CONCLUSION
After clustering, we can see that there are 2 clusters which are
Cluster 0:
• sepallenght: 6.1 sepalwidth: 2.9 petallenght: 4.7 petalwidth: 1.4 class: Iris-versicolor
Cluster 1:
• sepallenght: 6.2 sepalwidth: 2.9 petallenght: 4.3 petalwidth: 1.3 class: Iris-versicolor
The cluster instances show that 67% (100 instances) of the data set belongs to Cluster 0 and 33% (50
instances) of the data set belongs to Cluster 1. We can conclude by saying that the most genus of Iris
species that are present are the Iris-versicolor that has the following characteristics:
sepallenght: 6.1, sepalwidth: 2.9, petallenght: 4.7, petalwidth: 1.4
-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-
2. Construct Association Rules using Apriori Algorithm i. Use suitable data set. / Weka library data
(Supermarket data set) ii. Use Weka software tool.
INFERENCE
The data used contains details about a supermarket. The data consists of 217 attributes, some of which
are:
•Different departments
• Different types of commodities like coffee, spices, confectionery, medicine, food items etc.
The data set contains 4627 instances of varying attributes indicating that there were 4627 customers
who purchased items from the supermarket. A sample of the data set is shown in Figure B (1).
Figure B (1)
The‘t’ that is represented in the data indicates the items that a customer has purchased. Some of the
cells are blank which indicates that the customer has not bought that item.
The pre-processed data is shown in Figure B (2).
Figure: B (2).
To perform the apriori algorithm for the given data, we use the Weka Associator tool ‘Apriori’, where we
specify the confidence that is needed for the data so that the rules can be made.
The Associator output is as follows [Figure B (3) and Figure B (4)].
Figure: B (3)
Figure: B (3) shows the rules for the data set when the confidence is 50% or 0.5 and a minimum support
of 45% or 0.45 (2082 instances).
Figure: B (4)
Figure: B (4) shows the rules for the data set when the confidence is 75% or 0.75 and a minimum
support of 35% or 0.35 (1619 instances)
CONCLUSION
1. When 50% confidence was used, the number of cycles that were performed 11 and 10 rules
were obtained which are:
1. Biscuits=t 2605 ==> bread and cake=t 2083 conf: (0.8)

2. milk-cream=t 2939 ==> bread and cake=t 2337 conf: (0.8)
3. Fruit=t 2962 ==> bread and cake=t 2325 conf: (0.78)
4. Baking needs=t 2795 ==> bread and cake=t 2191 conf: (0.78)
5. Frozen foods=t 2717 ==> bread and cake=t 2129 conf: (0.78)
6. Vegetables=t 2961 ==> bread and cake=t 2298 conf: (0.78)
7. Vegetables=t 2961 ==> fruit=t 2207 conf: (0.75)
8. Fruit=t 2962 ==> vegetables=t 2207 conf: (0.75)
9. Bread and cake=t 3330 ==> milk-cream=t 2337 conf: (0.7)
10. Bread and cake=t 3330 ==> fruit=t 2325 conf: (0.7)
The number of large item sets present are two which are L1 (13) and L2 (7).
[Figure: B (3)]
2. When 75% confidence was applied, the number of cycles performed were 13 and the 10 rules
that were found are:
1. milk-cream=t fruit=t 2038 ==> bread and cake=t 1684 conf: (0.83)
2. milk-cream=t vegetables=t 2025 ==> bread and cake=t 1658 conf: (0.82)
3. Fruit=t vegetables=t 2207 ==> bread and cake=t 1791 conf: (0.81)
4. Margarine=t 2288 ==> bread and cake=t 1831 conf: (0.8)
5. Biscuits=t 2605 ==> bread and cake=t 2083 conf: (0.8)
6. milk-cream=t 2939 ==> bread and cake=t 2337 conf: (0.8)
7. tissues-paper prd=t 2247 ==> bread and cake=t 1776 conf: (0.79)
8. Fruit=t 2962 ==> bread and cake=t 2325 conf: (0.78)
9. Baking needs=t 2795 ==> bread and cake=t 2191 conf: (0.78)
10. Frozen foods=t 2717 ==> bread and cake=t 2129 conf: (0.78)
The number of large item sets present are three which are L1 (22), L2 (36) and L3 (3).
[Figure: B (4)]
3. The rules change on the basis of the confidence that we have taken.

Hemant Bhagwan Sawale FMS-MBA-2019-21-006 Sec: A

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hemant Bhagwan Sawale FMS-MBA-2019-21-006 Sec: A

Uploaded by

Copyright:

Available Formats

HEMANT BHAGWAN SAWALE

• Sepal width: specifies the sepal width of an iris flower.

• Petal length: specifies the petal length of an iris flower.

• Petal width: specifies the petal width of an iris flower.

• Class: specifies the different species of the genus Iris.

• The data set consists of 150 instances of varying attributes.

A sample of the data is as shown below in Figure: A (1)

sepallenght: 6.1, sepalwidth: 2.9, petallenght: 4.7, petalwidth: 1.4

The pre-processed data is shown in Figure B (2).

The Associator output is as follows [Figure B (3) and Figure B (4)].

1. Biscuits=t 2605 ==> bread and cake=t 2083 conf: (0.8)

You might also like