Professional Documents
Culture Documents
Kman 07
Kman 07
No: 07
Data and Text clustering using K-means algorithm.
DATE:
Objective:
Hardware Specification:
Software Specification:
Python 3.12
Algorithm:
The provided program implements the K-means clustering algorithm for
clustering used cars based on their year, price, and mileage. The K-means
algorithm is an unsupervised machine learning algorithm used for grouping
similar data points into clusters.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
# Based on the Elbow Method, choose the optimal number of clusters and
perform clustering
k = 3 # Change this to the chosen number of clusters
Cluster Centers:
The K-means clustering algorithm has effectively segmented the used car
dataset into three distinct clusters based on the year, price, and mileage features.
The Elbow Method graph indicates the optimal number of clusters as three,
where the within-cluster sum of squares (WCSS) starts to level off. The scatter
plot visualizes these three clusters, with the yellow cluster representing newer
and more expensive cars, the turquoise cluster representing mid-range cars in
terms of age and price, and the purple cluster consisting of older and less
expensive cars with higher mileage. The cluster centers provided in the output
quantify these differences, with Cluster 0 having the newest cars (around 2009-
2010 model years) with higher prices and lower mileage, Cluster 1 representing
slightly older cars (around 2007-2008) with moderate prices and higher
mileage, and Cluster 2 having the oldest cars (around 2002-2003) with the
lowest prices and highest mileage. This clustering effectively captures the
inherent patterns in the used car market, where newer cars with lower mileage
command higher prices, while older cars with higher mileage are generally less
expensive, aligning with intuitive expectations.
RESULT:
The K-means clustering algorithm successfully grouped the used
car dataset from https://raw.githubusercontent.com/stedy/Machine-Learning-
with-R-datasets/master/usedcars.csv into three distinct clusters based on the
year, price, and mileage features. Cluster 0 represented newer and more
expensive cars, Cluster 1 captured mid-range cars, and Cluster 2 consisted of
older and less expensive cars with higher mileage, effectively separating the
used cars based on their age, price, and mileage characteristics.