Professional Documents
Culture Documents
Bt19ece037 Lab2
Bt19ece037 Lab2
Submitted by :
Sakshi Gupta (BT19ECE037) Semester 7
Submitted to :
Dr. Saugata Sinha
(Course Instructor)
Department of ECE,
VNIT Nagpur
1.0 Sakshi Gupta (BT19ECE037)
Linear Regression
Abstract: Machine Learning, as the name suggests, focuses on making the machines
(computers) learn. There are various algorithms designed such that, with the given
data, the machines are able to recognise patterns in the data and then try to predict
values for the future. The accuracy of the algorithm depends on the percentage of
correct predictions it is able to make.
Method/Procedure:
1. The given data file is ”Matlab accidents.mat”. Load the file into a dataframe and
split it into training and testing datasets using the function BT19ECE037 train
test split.
2. Once the data is loaded into a dataframe, we need to recognise the dependent
and independent variables from the data so as to generate X and y.
3. For this problem, the columns Licensed drivers (thousands), Registered vehicles
(thousands), and Vehicles-miles travelled (millions) are chosen as the
independent variables, and the column Traffic fatalities is chosen as the
dependent variable.
4. First, we use the Pseudo Inverse Method. The predicted values for the test
dataset are computed and then compared with the actual values so as to obtain
the accuracy.
5. Then, the Gradient Descent Algorithm is used to do the same and the accuracy
is obtained.
6. Now, for a change in relationship of the input and output variables, we choose
an arbitrary relationship where an extra term is added after squaring the
Registered vehicles (thousands) column, and the accuracy is again calculated.
Results/Discussion:
1
1.0 Sakshi Gupta (BT19ECE037)
2. Using the Gradient Descent Algorithm, the Room Mean Squared Accuracy
obtained is 91.52%. As we can see, this is a significant improvement over the
Pseudo Inverse method.
2
1.0 Sakshi Gupta (BT19ECE037)
3. Now, in order to change the relationship, an extra term is added after squaring
the Registered vehicles (thousands) column arbitrarily. The RMS Accuracy we
now obtain is 85.88%, which is nearly equal to the accuracy obtained with the
Pseudo Inverse Method of Linear Regression.
Figure 3: Predicted and True Values After changing the relationship between input
and output
1
# Importing required libraries and Getting the data ready
2
3
1.0 Sakshi Gupta (BT19ECE037)
6
import pandas as pd
7
import os
8
from mat4py import loadmat
9
import numpy as np
10
import matplotlib.pyplot as plt
11
import sys
12
from sklearn.metrics import mean squared error, mean absolute error
13
16 **Importing data into a dataframe and Splitting into Train and ... Test Datasets**
17 """
18
4
1.0 Sakshi Gupta (BT19ECE037)
pd.DataFrame(datamat["hwydata"],columns=datamat["hwyheaders"])
5
1.0 Sakshi Gupta (BT19ECE037)
63 trainX = np.array(trainX)
64 testX = np.array(testX)
65 trainy = np.array(trainy)
66 testy = np.array(testy)
67
75 onestest = np.ones([testX.shape[0],1])
76 testtheta = np.hstack((ones test,testX))
77
93 def gradient descent(trainX, trainy, testX, testy, ... iterations=1000, learning rate=0.001):
94 # Creating the theta matrices for training and testing
95 onestrain= np.ones([trainX.shape[0],1])
96 traintheta = np.hstack((ones train,trainX))
97
98 onestest = np.ones([testX.shape[0],1])
99 testtheta = np.hstack((ones test,testX))
100
6
1.0 Sakshi Gupta (BT19ECE037)
104 # Applying gradient descent to find the optimum weights 105 for i in
range(iterations):
106 temp = np.matmul(traintheta,weights) − trainy
107 error = np.matmul(traintheta.T, temp)
108 weights = weights − learningrate * error
109