Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

HEMANT BHAGWAN SAWALE

FMS-MBA-2019-21-006

SEC: A

1. Construct Clustering using K-means Algorithm. i. Use suitable data set. / Weka library data (Iris data
set) ii. Use Weka software tool.

INFERENCE

The data used for clustering contains details about the Iris flower. The data consists of 5 attributes
namely: Sepal length: specifies the length of the sepal of an iris flower.

• Sepal width: specifies the sepal width of an iris flower.

• Petal length: specifies the petal length of an iris flower.

• Petal width: specifies the petal width of an iris flower.

• Class: specifies the different species of the genus Iris.

• The data set consists of 150 instances of varying attributes.

A sample of the data is as shown below in Figure: A (1)

Figure: A (1)

The pre-processing of the data set was done and the result is as follows Figure: A (2):-
Figure: A (2)

Weka is used to perform clustering on the given data set. K means algorithm is used to the clustering
process. It is one of the most commonly used algorithms for clustering. In Weka, the clustering tool that is
going to be used is ‘SimpleKMeans’. The clustered output is as follows [Figure A (3) and Figure A (4)].

Figure: A (3)
Figure: A (4)

CONCLUSION

After clustering, we can see that there are 2 clusters which are

Cluster 0:

• sepallenght: 6.1 sepalwidth: 2.9 petallenght: 4.7 petalwidth: 1.4 class: Iris-versicolor

Cluster 1:

• sepallenght: 6.2 sepalwidth: 2.9 petallenght: 4.3 petalwidth: 1.3 class: Iris-versicolor

The cluster instances show that 67% (100 instances) of the data set belongs to Cluster 0 and 33% (50
instances) of the data set belongs to Cluster 1. We can conclude by saying that the most genus of Iris
species that are present are the Iris-versicolor that has the following characteristics:

sepallenght: 6.1, sepalwidth: 2.9, petallenght: 4.7, petalwidth: 1.4

-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-

2. Construct Association Rules using Apriori Algorithm i. Use suitable data set. / Weka library data
(Supermarket data set) ii. Use Weka software tool.

INFERENCE

The data used contains details about a supermarket. The data consists of 217 attributes, some of which
are:
•Different departments

• Different types of commodities like coffee, spices, confectionery, medicine, food items etc.

The data set contains 4627 instances of varying attributes indicating that there were 4627 customers
who purchased items from the supermarket. A sample of the data set is shown in Figure B (1).

Figure B (1)

The‘t’ that is represented in the data indicates the items that a customer has purchased. Some of the
cells are blank which indicates that the customer has not bought that item.

The pre-processed data is shown in Figure B (2).

Figure: B (2).
To perform the apriori algorithm for the given data, we use the Weka Associator tool ‘Apriori’, where we
specify the confidence that is needed for the data so that the rules can be made.

The Associator output is as follows [Figure B (3) and Figure B (4)].

Figure: B (3)

Figure: B (3) shows the rules for the data set when the confidence is 50% or 0.5 and a minimum support
of 45% or 0.45 (2082 instances).
Figure: B (4)

Figure: B (4) shows the rules for the data set when the confidence is 75% or 0.75 and a minimum
support of 35% or 0.35 (1619 instances)

CONCLUSION

1. When 50% confidence was used, the number of cycles that were performed 11 and 10 rules
were obtained which are:

1. Biscuits=t 2605 ==> bread and cake=t 2083 conf: (0.8)


2. milk-cream=t 2939 ==> bread and cake=t 2337 conf: (0.8)
3. Fruit=t 2962 ==> bread and cake=t 2325 conf: (0.78)
4. Baking needs=t 2795 ==> bread and cake=t 2191 conf: (0.78)
5. Frozen foods=t 2717 ==> bread and cake=t 2129 conf: (0.78)
6. Vegetables=t 2961 ==> bread and cake=t 2298 conf: (0.78)
7. Vegetables=t 2961 ==> fruit=t 2207 conf: (0.75)
8. Fruit=t 2962 ==> vegetables=t 2207 conf: (0.75)
9. Bread and cake=t 3330 ==> milk-cream=t 2337 conf: (0.7)
10. Bread and cake=t 3330 ==> fruit=t 2325 conf: (0.7)

The number of large item sets present are two which are L1 (13) and L2 (7).

[Figure: B (3)]
2. When 75% confidence was applied, the number of cycles performed were 13 and the 10 rules
that were found are:

1. milk-cream=t fruit=t 2038 ==> bread and cake=t 1684 conf: (0.83)
2. milk-cream=t vegetables=t 2025 ==> bread and cake=t 1658 conf: (0.82)
3. Fruit=t vegetables=t 2207 ==> bread and cake=t 1791 conf: (0.81)
4. Margarine=t 2288 ==> bread and cake=t 1831 conf: (0.8)
5. Biscuits=t 2605 ==> bread and cake=t 2083 conf: (0.8)
6. milk-cream=t 2939 ==> bread and cake=t 2337 conf: (0.8)
7. tissues-paper prd=t 2247 ==> bread and cake=t 1776 conf: (0.79)
8. Fruit=t 2962 ==> bread and cake=t 2325 conf: (0.78)
9. Baking needs=t 2795 ==> bread and cake=t 2191 conf: (0.78)
10. Frozen foods=t 2717 ==> bread and cake=t 2129 conf: (0.78)

The number of large item sets present are three which are L1 (22), L2 (36) and L3 (3).

[Figure: B (4)]

3. The rules change on the basis of the confidence that we have taken.

You might also like