Professional Documents
Culture Documents
Iris Species IB
Iris Species IB
Shahzaman (46146)
M.Abdullah (47750)
In this project we apply methods/techniques that we study in course business analytics for
doing the analysis. In the first phase of project we created an account on kaggle.com where we
can xplore varoius datasets. We selected a problem "Iris species" to work on and dowload the
dataset.
Dataset:
The Iris dataset was used in R.A. Fisher's classic 1936 paper, The Use of Multiple Measurements
in Taxonomic Problems, and can also be found on the UCI Machine Learning Repository. It
includes three iris species with 50 samples each as well as some properties about each flower.
One flower species is linearly separable from the other two, but the other two are not linearly
separable from each other.
Id
SepalLengthCm
SepalWidthCm
PetalLengthCm
PetalWidthCm
Species
CLUSTERING:
K Mean Clustering:
First of all we download and explore the dataset of Iris species. And we decided to
segment the Iris data into clustering.
So we conclude that 3 is best value for K to be used to create the final model.
plot(1:k.max,wss, type= "b", xlab = "Number of clusters(k)", ylab = "Within cluster sum of squares")
Results:
1 0 2 46
2 50 0 0
3 0 48 4
Conclusion:
We have download Iris species dataset from Kaggle.com. We apply the K Mean clustering
technique on dataset, Clustering aims to classify data from the whole data space and we found
the optimum value of (K-Mean) K=3.