Professional Documents
Culture Documents
Question 1 B
Question 1 B
Cost Function:
The cost function is a measure of how well the linear regression model fits the data. It is
defined as the sum of squared errors between the predicted values and the actual values.
The formula for the cost function is given by:
J(β0, β1, …, βn) = 1/2m * Σ(i=1 to m) (Yi - Ŷi)^2
Where,
m is the number of training examples. Yi is the actual value of the dependent variable for
the ith training example. Ŷi is the predicted value of the dependent variable for the ith
training example. The aim of the linear regression model is to minimize the cost function by
finding the optimal values of the regression coefficients β1, β2, …, βn.
# Split the data into training and testing sets using inbuild function
X_train, X_test, y_train, y_test =
train_test_split(data['YearsExperience'], data['Salary'],
test_size=0.25, random_state=42)
# Single variable linear regression
model_single = LinearRegression()
model_single.fit(X_train.values.reshape(-1,1), y_train)
print(f"MSE: {mse}")
print(f"RMSE: {rmse}")
print(f"R2: {r2}")
plt.plot(training_sizes, rmse_scores)
plt.xlabel('% of training data')
plt.ylabel('RMSE')
plt.show()
MSE: 38802588.99247065
RMSE: 6229.172416338358
R2: 0.9347210011126782
#Multivariable linear regression
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')
X = housing.data
y = housing.target
MSE: 6.467064527275153e+30
RMSE: 2543042376224815.0
R^2: -4.89243206637211e+30