AI Lab10

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Question:

Solve the multi variable linear regression problem for the ‘’Marks’’ dataset, use mean
normalization to scale the features. Analyze your model using different learning rates, also provide
a solution in limited number of iterations by observing the cost function.
Solution:
Code:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import numpy.linalg as LA
from sklearn import metrics
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score

# Training and testing data


data = pd.read_csv('marks.csv')
x1 = data['Quiz'].values
x2 = data['Assg'].values
x3 = data['Mid'].values
m = len(x1)
x0 = np.ones(m)
X = np.array([x0, x3]).T
Y = data['Final'].values

# Feature scaling using mean normalization


X[:, 1] = (X[:, 1] - np.mean(X[:, 1])) / np.std(X[:, 1])

# Splitting the data into training and testing sections


X_train, X_test, y_train, y_test = train_test_split(X, Y,
test_size=0.20, random_state=0)
theta = np.zeros(2)
num_iters = 10000

# Function to compute cost


def compute_cost(X, y, theta):
m = len(y)
h = X.dot(theta)
cost = (1/(2*m)) * np.sum(np.square(h - y))
return cost

# Gradient descent function


def gradient_descent(X, y, theta, alpha, num_iters):
m = len(y)
cost_history = []
for i in range(num_iters):
h = np.dot(X, theta)
theta = theta - (alpha / m) * np.dot(X.T, (h - y))
cost = compute_cost(X, y, theta)
cost_history.append(cost)
return theta, cost_history

# Testing the model with different learning rates


for rate in [0.001, 0.01, 0.1]:
theta = np.zeros(2)

theta, cost_history = gradient_descent(X_train, y_train, theta,


# Testing the model with different learning rates
for rate in [0.001, 0.01, 0.1]:
theta = np.zeros(2)

theta, cost_history = gradient_descent(X_train, y_train, theta,


rate, num_iters)
print(theta)
y_pred_test = np.dot(X_test, theta)
r2 = r2_score(y_test, y_pred_test)
print(f"Learning Rate: {rate}, Accuracy: {r2}")

# Observing the cost function for a limited number of iterations


limited_iters = 1000
alpha = 0.01
theta = np.zeros(2)
theta, cost_history = gradient_descent(X_train, y_train, theta, alpha,
limited_iters)
plt.plot(range(limited_iters), cost_history)
plt.xlabel('Iterations')
plt.ylabel('Cost')
plt.title('Cost Function over Iterations')
plt.show()

Output:

Figure 1

Figure 2
Discussion:
From above figure1 accuracy for different learning rates is as follow:

Learning rate Accuracy


0.005 0.907
0.01 0.913
0.1 0.913

There is no such major effect on accuracy by changing the learning rate in this case. Values of theta
are also shown in figure1.

For learning rate = 0.01 relation between cost and iterations is shown is figure2. From the figure we
can analyze that cost function will decrease with each iteration but after 400 iterations it becomes
constant, and there is no such significant decrease in cost function, therefore we don’t need to
perform further iterations and we can terminate our program here.

You might also like