Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

International Islamic University Islamabad

Faculty of Engineering & Technology


Department of Electrical Engineering

Machine Learning Lab

Experiment No. 5: Linear Regression

Name of Student: ……………………………………

Registration No.: …………………………………….

Date of Experiment: …………………………………

1
Linear Regression
Linear Regression is one of the most common, some 200 years old and most easily understandable
in statistics and machine learning. It comes under predictive modelling. Predictive modelling is a
kind of modelling here the possible output(Y) for the given input(X) is predicted based on the
previous data or values.

Types

1. Simple Linear Regression: It is characterized by one independent variable. Consider the


price of the house based only one field that is the size of the plot then that would be a simple
linear regression.

2. Multiple Linear Regression: It is characterized by multiple independent variables. The


price of the house if depends on more that one like the size of the plot area, the economy then
it is considered as multiple linear regression which is in most real-world scenarios.

Equation:

Simple Linear Regression

Simple linear regression is an approach for predicting a response using a single feature.
It is assumed that the two variables are linearly related. Hence, we try to find a linear function that
predicts the response value(y) as accurately as possible as a function of the feature or independent
variable(x).
Let us consider a dataset where we have a value of response y for every feature x:

2
For generality, we define:
x as feature vector, i.e x = [x_1, x_2, …., x_n],
y as response vector, i.e y = [y_1, y_2, …., y_n]
for n observations (in above example, n=10). A scatter plot of above dataset looks like:-

Now, the task is to find a line which fits best in above scatter plot so that we can predict the
response for any new feature values. (i.e a value of x not present in dataset) This line is called
regression line.
The equation of regression line is represented as:

3
Here,
• h(x_i) represents the predicted response value for ith observation.
• b_0 and b_1 are regression coefficients and represent y-intercept and slope of regression
line respectively.
To create our model, we must “learn” or estimate the values of regression coefficients b_0 and b_1.
And once we’ve estimated these coefficients, we can use the model to predict responses!
In this article, we are going to use the Least Squares technique. Now consider:

Here, e_i is residual error in i th observation. So, our aim is to minimize the total residual error.
We define the squared error or cost function J as:

and our task is to find the value of b_0 and b_1 for which J(b_0,b_1) is minimum!
Without going into the mathematical details, we present the result here:

𝛽0 = 𝑦 − 𝛽1 𝑥

where SSxy is the sum of cross-deviations of y and x:

and SS_xx is the sum of squared deviations of x:

4
Given below is the python implementation of above technique on our small dataset:

1. import numpy as np
2. import matplotlib.pyplot as plt
3. def estimate_coef(x, y):
4. # number of observations/points
5. n = np.size(x)
6.
7. # mean of x and y vector
8. m_x, m_y = np.mean(x), np.mean(y)
9.
10. # calculating cross-deviation and deviation about x
11. SS_xy = np.sum(y*x) - n*m_y*m_x
12. SS_xx = np.sum(x*x) - n*m_x*m_x
13.
14. # calculating regression coefficients
15. b_1 = SS_xy / SS_xx
16. b_0 = m_y - b_1*m_x
17.
18. return(b_0, b_1)
19.
20. def plot_regression_line(x, y, b):
21. # plotting the actual points as scatter plot
22. plt.scatter(x, y, color = "m",
23. marker = "o", s = 30)
24.
25. # predicted response vector
26. y_pred = b[0] + b[1]*x
27.
28. # plotting the regression line
29. plt.plot(x, y_pred, color = "g")
30.
31. # putting labels
32. plt.xlabel('x')
33. plt.ylabel('y')
34.
35. # function to show plot
36. plt.show()
37.
38. def main():
39. # observations
40. x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
41. y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])
42.
43. # estimating coefficients
44. b = estimate_coef(x, y)
45. print("Estimated coefficients:\nb_0 = {} \ \nb_1 = {}".format(b[0], b[1]))
46.
47. # plotting regression line
48. plot_regression_line(x, y, b)
49. if __name__ == "__main__":
50. main()

Lab Task:
• Implement multiple linear regression technique on the Boston house pricing
dataset using Scikit-learn.
• Mention the estimated coefficients obtained in linear regression code.

You might also like