Bt19ece037 Lab2

Visvesvaraya National Institute of
Technology (VNIT), Nagpur
Machine Learning with Python

(ECL443)
Lab Report
Submitted by :
Sakshi Gupta (BT19ECE037) Semester 7
Submitted to :
Dr. Saugata Sinha
(Course Instructor)
Department of ECE,
VNIT Nagpur
1.0 Sakshi Gupta (BT19ECE037)
Linear Regression
Abstract: Machine Learning, as the name suggests, focuses on making the machines
(computers) learn. There are various algorithms designed such that, with the given
data, the machines are able to recognise patterns in the data and then try to predict
values for the future. The accuracy of the algorithm depends on the percentage of
correct predictions it is able to make.
Introduction: Linear regression is a machine learning algorithm where the

dependent variable(y) is written as an expression of a function of the independent
variables (x1, x2, x3,...), with weights assigned to each function, and the model tries to
learn the optimum values for the weights according to the given dataset, so as to
obtain a minimal error value.
Problem Statement:
Given the number of registered vehicles, number of licensed drivers and the number
of miles travelled by the vehicles, predict the number of traffic fatalities caused using
linear regression.
Method/Procedure:
1. The given data file is ”Matlab accidents.mat”. Load the file into a dataframe and
split it into training and testing datasets using the function BT19ECE037 train
test split.
2. Once the data is loaded into a dataframe, we need to recognise the dependent
and independent variables from the data so as to generate X and y.
3. For this problem, the columns Licensed drivers (thousands), Registered vehicles
(thousands), and Vehicles-miles travelled (millions) are chosen as the
independent variables, and the column Traffic fatalities is chosen as the
dependent variable.
4. First, we use the Pseudo Inverse Method. The predicted values for the test
dataset are computed and then compared with the actual values so as to obtain
the accuracy.
5. Then, the Gradient Descent Algorithm is used to do the same and the accuracy
is obtained.
6. Now, for a change in relationship of the input and output variables, we choose
an arbitrary relationship where an extra term is added after squaring the
Registered vehicles (thousands) column, and the accuracy is again calculated.
Results/Discussion:
1
1. The dependent variables chosen are Licensed drivers (thousands), Registered

vehicles (thousands), and Vehicles-miles traveled (millions). The independent
variable chosen is Traffic fatalities. The Root Mean Squared Accuracy obtained
using the Pseudo Inverse Method is 85.32%.
Figure 1: Predicted and True Values using Pseudo Inverse
2. Using the Gradient Descent Algorithm, the Room Mean Squared Accuracy
obtained is 91.52%. As we can see, this is a significant improvement over the
Pseudo Inverse method.
2
Figure 2: Predicted and True Values using Gradient Descent
3. Now, in order to change the relationship, an extra term is added after squaring
the Registered vehicles (thousands) column arbitrarily. The RMS Accuracy we
now obtain is 85.88%, which is nearly equal to the accuracy obtained with the
Pseudo Inverse Method of Linear Regression.
Figure 3: Predicted and True Values After changing the relationship between input
and output
Conclusion: Linear Regression consists of formulating the independent variables as

functions of dependent variables, each multiplied by a respective weight. The model
then tries to find the optimum value of the weight so that the error value is minimum.
This can be done by using two methods: Pseudo Inverse and Gradient Descent. As
obtained above, the accuracy in the Gradient Descent method is better as compared
to the Pseudo Inverse model.
Appendices: The code for linear regression is given below:
1
# Importing required libraries and Getting the data ready
2
3 ## Importing required libraries

4
5 #!pip install mat4py
3
6
import pandas as pd
7
import os
8
from mat4py import loadmat
9
import numpy as np
10
import matplotlib.pyplot as plt
11
import sys
12
from sklearn.metrics import mean squared error, mean absolute error
13
14 """## Getting the data

15
16 **Importing data into a dataframe and Splitting into Train and ... Test Datasets**
17 """
18
19 def BT19ECE120 datasetdivshuffle(filepath, traintestratio=0.2):

20 ext = os.path.splitext(filepath)[1]
21 # import the data 22 if ext==".csv":
23 data = pd.readcsv(filepath) 24 elif
ext==".xlsx":
25 data = pd.readexcel(filepath) 26 elif
ext==".mat":
27 load data = loadmat(filepath) 28 datamat =
load data["accidents"]
29 data = ...
4
pd.DataFrame(datamat["hwydata"],columns=datamat["hwyheaders"])
30 states = [x[0] for x in datamat["statelabel"]] 31 data.insert(loc = 1, column =

"State", value=states)
32 else:
33 print("File not found")
34 return None
35 trainfrac = 1 − traintestratio
36 testfrac = traintestratio
37 train = data.sample(frac = trainfrac)
38 test = data.sample(frac = testfrac)
39 return train, test
40
41 train, test = ...

BT19ECE120 datasetdivshuffle("./Matlab accidents.mat")
42 train.head()
43
44 """**Gathering the required Dependent and Independent**

45
46 Here, the *Licensed drivers (thousands)*, *Registered vehicles ...

(thousands)* and *Vehicle−miles traveled (millions)* are ... independent variables and
*Traffic fatalities* is the ...
dependent variable.
47 """
48
49 trainX = train[['Licensed drivers (thousands)','Registered ...

vehicles (thousands)','Vehicle−miles traveled (millions)']]
50 trainy = train['Traffic fatalities']
51 testX = test[['Licensed drivers (thousands)','Registered ...
vehicles (thousands)','Vehicle−miles traveled (millions)']]
52 testy = test['Traffic fatalities']
53
5
54 """**Normalizing train and test variables**"""

55
56 for column in trainX:

57 trainX[column] = trainX[column]/np.amax(trainX[column])
58 trainy= trainy/np.amax(trainy) 59 for column in testX:
60 testX[column] = testX[column]/np.amax(testX[column])
61 testy = testy/np.amax(testy)
62
63 trainX = np.array(trainX)
64 testX = np.array(testX)
65 trainy = np.array(trainy)
66 testy = np.array(testy)
67
68 """# Solving using Pseudo−Inverse Method"""

69
70 def linreg pseudoinv(trainX, trainy, testX, testy):

71 # Creating the theta matrices for training and testing
72 onestrain= np.ones([trainX.shape[0],1])
73 traintheta = np.hstack((ones train,trainX))
74
75 onestest = np.ones([testX.shape[0],1])
76 testtheta = np.hstack((ones test,testX))
77
78 # Finding the optimum weights

79 weights = np.matmul(np.linalg.pinv(train theta),trainy)
80
81 # Predicting y using optimum weights

82 y preds = np.matmul(test theta,weights)
83 plt.plot(testy)
84 plt.plot(y preds)
85 print("Mean Squared Error: ",meansquarederror(testy, y preds)) 86 print("Root Mean
Squared Error: ",meansquarederror(testy, ... y preds, squared = False))
87 print("Mean Absolute Error: ",mean absoluteerror(testy, y preds))
88
89 linreg pseudoinv(trainX, trainy, testX, testy)

90
91 """# Solving Using Gradient Descent"""

92
93 def gradient descent(trainX, trainy, testX, testy, ... iterations=1000, learning rate=0.001):
94 # Creating the theta matrices for training and testing
95 onestrain= np.ones([trainX.shape[0],1])
96 traintheta = np.hstack((ones train,trainX))
97
99 testtheta = np.hstack((ones test,testX))
100
6
101 # Considering some random values for weights initially

102 weights = np.random.randn(len(train theta[1]))
103
104 # Applying gradient descent to find the optimum weights 105 for i in
range(iterations):
106 temp = np.matmul(traintheta,weights) − trainy
107 error = np.matmul(traintheta.T, temp)
108 weights = weights − learningrate * error
109
110 # Predict y using the optimum weights obtained

112 plt.plot(testy)
117
118 gradient descent(trainX, trainy, testX, testy)

119
120 """# Changing relationship between input and output variables"""

121
122 def changed relationship(trainX, trainy, testX, testy):

123 # Redefining the theta matrix for a new relationship
124 onestrain= np.ones([trainX.shape[0],1]) 125 squaredcoltrain = ...
np.square(trainX.T[1]).reshape(trainX.shape[0],1)
126 traintheta = np.hstack((ones train,squaredcoltrain,trainX))
127

129 squaredcoltest = np.square(testX.T[1]).reshape(testX.shape[0],1) 130 testtheta =
np.hstack((ones test,squaredcoltest,testX))
131
132 # Finding the optimum weights

133 weights = np.matmul(np.linalg.pinv(train theta),trainy)
134
135 # Predicting y using optimum weights

137 plt.plot(testy)
142
143 changed relationship(trainX, trainy, testX, testy)

Bt19ece037 Lab2

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bt19ece037 Lab2

Uploaded by

Copyright:

Available Formats

Visvesvaraya National Institute of

Technology (VNIT), Nagpur

Machine Learning with Python

Introduction: Linear regression is a machine learning algorithm where the

1. The dependent variables chosen are Licensed drivers (thousands), Registered

Figure 1: Predicted and True Values using Pseudo Inverse

Figure 2: Predicted and True Values using Gradient Descent

Conclusion: Linear Regression consists of formulating the independent variables as

Appendices: The code for linear regression is given below:

3 ## Importing required libraries

14 """## Getting the data

19 def BT19ECE120 datasetdivshuffle(filepath, traintestratio=0.2):

30 states = [x[0] for x in datamat["statelabel"]] 31 data.insert(loc = 1, column =

41 train, test = ...

44 """**Gathering the required Dependent and Independent**

46 Here, the *Licensed drivers (thousands)*, *Registered vehicles ...

49 trainX = train[['Licensed drivers (thousands)','Registered ...

54 """**Normalizing train and test variables**"""

56 for column in trainX:

68 """# Solving using Pseudo−Inverse Method"""

70 def linreg pseudoinv(trainX, trainy, testX, testy):

78 # Finding the optimum weights

81 # Predicting y using optimum weights

89 linreg pseudoinv(trainX, trainy, testX, testy)

91 """# Solving Using Gradient Descent"""

101 # Considering some random values for weights initially

110 # Predict y using the optimum weights obtained

118 gradient descent(trainX, trainy, testX, testy)

120 """# Changing relationship between input and output variables"""

122 def changed relationship(trainX, trainy, testX, testy):

128 onestest = np.ones([testX.shape[0],1])

132 # Finding the optimum weights

135 # Predicting y using optimum weights

143 changed relationship(trainX, trainy, testX, testy)

You might also like

44 """Gathering the required Dependent and Independent

46 Here, the Licensed drivers (thousands), *Registered vehicles ...

54 """Normalizing train and test variables"""