,ML Practical

SHUBHAM SHARMA
1/18/FET/BCS/188
PRACTICAL 2
Write a program to the demonstrate the working of Linear Regression. Use an
appropriate dataset and evaluate the result
#import the data set first

#shubham sharma 1/18/FET/BCS/188
# data is present in Data.csv file in current folder or working directory(use
command setwd("Current location of the Data.csv") to set the working directory
to current directory.)
dataset <- read.csv("Data.csv")
#Showing the dataset

dataset
## Country Age Salary Purchased
## 1 France 44 72000 No
## 2 Spain 27 48000 Yes
## 3 Germany 30 54000 No
## 4 Spain 38 61000 No
## 5 Germany 40 NA Yes
## 6 France 35 58000 Yes
## 7 Spain NA 52000 No
## 8 France 48 79000 Yes
## 9 Germany 50 83000 No
## 10 France 37 67000 Yes
# So missing values present in both Age and Salary Columns
#taking care of missing values
# By replacing it to the average value for non NA entries.
dataset$Age <- ifelse(is.na(dataset$Age),

SHUBHAM SHARMA
1/18/FET/BCS/188
ave(dataset$Age, FUN = function(x)

mean(x, na.rm = TRUE)),
dataset$Age)
dataset$Salary <- ifelse(is.na(dataset$Salary),

ave(dataset$Salary, FUN = function(x)
mean(x, na.rm = TRUE)),
dataset$Salary)
else,
take whatever present in dataset$Age
mean() :
#defining x = 1 2 3
x <- 1:3
#introducing missing value
x[1] <- NA
# mean = NA
mean(x)
## [1] NA
# mean = mean excluding the NA value
mean(x, na.rm = T)
## [1] 2.5
So finally the dataset looks like this :
dataset
## 1 France 44.00000 72000.00 No
## 2 Spain 27.00000 48000.00 Yes
## 3 Germany 30.00000 54000.00 No
## 4 Spain 38.00000 61000.00 No
## 5 Germany 40.00000 63777.78 Yes
## 6 France 35.00000 58000.00 Yes
## 7 Spain 38.77778 52000.00 No
## 8 France 48.00000 79000.00 Yes
## 9 Germany 50.00000 83000.00 No
## 10 France 37.00000 67000.00 Yes
Categorical data
SHUBHAM SHARMA
1/18/FET/BCS/188
Categorical data is non numeric data which belongs to specific set of categories. Like the Country
column in dataset
By default read.csv() function in R makes all the string variables as categorical variables(factor) but
suppose there is a name column in the dataset in that case we dont need them as categorical
variables. Below is the code to make specific variables as factor variables.
# Encoding categorical data

dataset$Country = factor(dataset$Country,
levels = c('France', 'Spain', 'Germany'),
labels = c(1, 2, 3))
dataset$Purchased = factor(dataset$Purchased,
levels = c('No', 'Yes'),
labels = c(0, 1))
Splitting into training and test dataset : When the dataset is presented to us to do machine learning
stuff we need some data as part of training and some to test the model after the learning stage is
done.
So we need to split the dataset into training and test, using below code we can do so,
For this we need to install catools,
#install.packages("catTools") #if not present

library(caTools) #adding caTools to the library
set.seed(123) # this is to ensure same output as split is done randomly, you
can exclude in real time
split = sample.split(dataset$Purchased,SplitRatio = 0.8)
training_set = subset(dataset,split == TRUE)
test_set = subset(dataset, split == FALSE)
SplitRatio is the ratio in which training and test set, its usually set an 80:20 for training and test
respectively.
sample.split() methid takes the column and calculates a numeric array with true and false in random
locations and with the given split ratio.
subset() method takes the dataset and subset according to the condition
Feature Scaling :
Hence, We need feature scaling which is done in below steps :
#feature scaling
training_set[,2:3] = scale(training_set[,2:3])
test_set[,2:3] = scale(test_set[,2:3])
SHUBHAM SHARMA
1/18/FET/BCS/188
2:3 is for both Age and Salary Now the dataset(training and test both) looks like :
training_set
## 1 1 0.90101716 0.9392746 0
## 2 2 -1.58847494 -1.3371160 1
## 3 3 -1.14915281 -0.7680183 0
## 4 2 0.02237289 -0.1040711 0
## 5 3 0.31525431 0.1594000 1
## 7 2 0.13627122 -0.9577176 0
## 8 1 1.48678000 1.6032218 1
## 10 1 -0.12406783 0.4650265 1
test_set
## 6 1 -0.7071068 -0.7071068 1
## 9 3 0.7071068 0.7071068 0
SHUBHAM SHARMA
1/18/FET/BCS/188
PRACTICAL 2
Write a program to Implement K-means clustering. Use appropriate dataset and
evaluate the algorithm
SepalLengthC SepalWidthC PetalLengthC PetalWidthC Specie

Id m m m m s
Iris-
1 5.1 3.5 1.4 0.2 setosa
Iris-
2 4.9 3 1.4 0.2 setosa
Iris-
3 4.7 3.2 1.3 0.2 setosa
Iris-
4 4.6 3.1 1.5 0.2 setosa
Iris-
5 5 3.6 1.4 0.2 setosa
Iris-
6 5.4 3.9 1.7 0.4 setosa
Iris-
7 4.6 3.4 1.4 0.3 setosa
Iris-
8 5 3.4 1.5 0.2 setosa
Iris-
9 4.4 2.9 1.4 0.2 setosa
Iris-
10 4.9 3.1 1.5 0.1 setosa
1 #!/ usr/ bin/ env

2
3 # I m pl e m e n t K - Mea ns Clu st er ing
5 # Im port libr ar ies
6 import numpy as np
7 import pandas as pd
8 import matplotlib . pyplot as plt
9 from sklearn . cluster import KMeans

10 import matplotlib . pyplot as plt
11
12 # im port the da ta s et
13 iris = pd. read_csv ('../ datasets / iris . csv ')

SHUBHAM SHARMA
1/18/FET/BCS/188
14 iris . head ()
15
16 # feature selection
17 x = iris . iloc [:, [0 ,1 ,2 ,3]]. values
18
19 # Ge ne rate M o del
20 # F ind o pt im u m v a lue of k ( elbo w m e t ho d )
21 Error =[]
22 for i in range (1 , 11) :
23 kmeans = KMeans ( n_ clusters = i). fit ( x)
24 kmeans . fit ( x)
25 Error . append ( kmeans . inertia_ )
26 plt . plot ( range (1 , 11) , Error )
27 plt . title (' Elbow method ')
28 plt . xlabel (' No of clusters ')
29 plt . ylabel (' Error ')
30 plt . show ()
31
32 # Elbow fo rm e d at a p pr o x 3 , t hus k = 3
33 kmeans3 = KMeans ( n_clusters =3 , max_iter =400 , algorithm =' auto ')
34 y_kmeans 3 = kmeans3 . fit_predict ( x)
35 print ( y_kmeans 3 )
36 kmeans3 . cluster_centers_
37
38 # Visualizing cluster
39 plt . scatter ( x[:, 0], x[:, 1], c= y_kmeans3 , cmap =' rainbow ')
40
41 # Evalu atin g algorith m
42 scaler = Min Max Scaler ()
43 X_scaled = scaler . fit_transform ( X)

44
45 kmeans . fit ( X_scaled )

46
47 correct = 0
48 for i in range ( len ( X)):

SHUBHAM SHARMA
1/18/FET/BCS/188
49 predict _me = np. array ( X[ i]. astype ( float ))
50 pr ed ic t_ me = predict_me . reshape (-1 , len ( predict_me ))
51 pr ed ic ti on = kmeans . predict ( predict_me )
52 if prediction [0] == y[ i]:

53 correct += 1
54
55 print (" Accuracy :" , correct / len ( X))
Output
SHUBHAM SHARMA
1/18/FET/BCS/188
SHUBHAM SHARMA
1/18/FET/BCS/188
SHUBHAM SHARMA
1/18/FET/BCS/188

,ML Practical

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

,ML Practical

Uploaded by

Copyright:

Available Formats

SHUBHAM SHARMA

#import the data set first

dataset <- read.csv("Data.csv")

#Showing the dataset

#taking care of missing values

# By replacing it to the average value for non NA entries.

dataset$Age <- ifelse(is.na(dataset$Age),

ave(dataset$Age, FUN = function(x)

dataset$Salary <- ifelse(is.na(dataset$Salary),

So finally the dataset looks like this :

# Encoding categorical data

#install.packages("catTools") #if not present

Hence, We need feature scaling which is done in below steps :

SepalLengthC SepalWidthC PetalLengthC PetalWidthC Specie

1 #!/ usr/ bin/ env

5 # Im port libr ar ies

8 import matplotlib . pyplot as plt

9 from sklearn . cluster import KMeans

13 iris = pd. read_csv ('../ datasets / iris . csv ')

17 x = iris . iloc [:, [0 ,1 ,2 ,3]]. values

20 # F ind o pt im u m v a lue of k ( elbo w m e t ho d )

22 for i in range (1 , 11) :

23 kmeans = KMeans ( n_ clusters = i). fit ( x)

25 Error . append ( kmeans . inertia_ )

26 plt . plot ( range (1 , 11) , Error )

27 plt . title (' Elbow method ')

28 plt . xlabel (' No of clusters ')

29 plt . ylabel (' Error ')

33 kmeans3 = KMeans ( n_clusters =3 , max_iter =400 , algorithm =' auto ')

34 y_kmeans 3 = kmeans3 . fit_predict ( x)

41 # Evalu atin g algorith m

42 scaler = Min Max Scaler ()

43 X_scaled = scaler . fit_transform ( X)

45 kmeans . fit ( X_scaled )

48 for i in range ( len ( X)):

49 predict _me = np. array ( X[ i]. astype ( float ))

50 pr ed ic t_ me = predict_me . reshape (-1 , len ( predict_me ))

51 pr ed ic ti on = kmeans . predict ( predict_me )

52 if prediction [0] == y[ i]:

55 print (" Accuracy :" , correct / len ( X))

You might also like