Download as pdf or txt
Download as pdf or txt
You are on page 1of 40

Machine Learning

Linear Regression

Quan Minh Phan & Ngoc Hoang Luong

University of Information Technology


-
Vietnam National University Ho Chi Minh City

October 7, 2022

Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 1 / 40
New Packages

numpy → very frequently used in ML (python)


Link: https://numpy.org/doc/stable/user/index.html#user

>> import numpy as np

matplotlib → for visualization


Link: https://matplotlib.org/stable/tutorials/index.html

>> import matplotlib.pyplot as plt

Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 2 / 40
Generate A Regression Problem

>> from sklearn.datasets import make regression


>> X, y = make regression(n samples=500, n features=1,
n informative=1, noise=25, random state=42)

Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 3 / 40
Data Visualization

>> plt.scatter(X, y, facecolor=’tab:blue’, edgecolor=’white’, s=70)


plt.xlabel(’X’)
plt.ylabel(’y’)
plt.show()

Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 4 / 40
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 5 / 40
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 6 / 40
Recall (Linear Regression)

Figure: The general concept of Linear Regression

Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 7 / 40
Minimizing cost function with gradient descent

Cost function (Squared Error):

1 X (i)
J(w ) = (y − ŷ (i) )2 (1)
2
i

Update the weights:


wt+1 := wt + ∆w (2)
∆w = −η∇J(w ) (3)

∂J X (i)
=− (y (i) − ŷ (i) )xj (4)
∂wj
i

∂J X (i)
∆wj = −η =η (y (i) − ŷ (i) )xj (5)
∂wj
i

Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 8 / 40
Minimizing cost function with gradient descent (cont.)

(
wj + η ∗ sum(y − ŷ ) j =0
wj = (i)
wj + η ∗ i (y (i) − ŷ (i) )xj
P
j ∈ [1, . . . , n]

Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 9 / 40
Pseudocode of the Training Process

Algorithm 1 Gradient Descent


1: Initialize the weights, w
2: while Stopping Criteria is not satisfied do
3: Compute the output value, ŷ
4: Updates the weights
5: Compute the difference between y and ŷ
6: Update the intercept
7: Update the coefficients
8: end while

Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 10 / 40
Components

Hyperparameters
eta (float): the initial learning rate
max iter (int): the maximum number of iterations
random state (int)

Parameters
w (list/array): the weight values
costs (list/array): the list containing the cost values over iterations

Methods
fit(X , y )
predict(X )

Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 11 / 40
Implement (code from scratch)

class LinearRegression GD:


def init (self, eta = 0.001, max iter = 20, random state = 42):
self.eta = eta
self.max iter = max iter
self.random state = random state
self.w = None
self.costs = [ ]

def predict(self, X):


return np.dot(X, self.w[1:]) + self.w[0]

Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 12 / 40
’fit’ method

def fit(self, X, y):


rgen = np.random.RandomState(self.random state)
self.w = rgen.normal(loc = 0.0, scale = 0.01, size = 1 + X.shape[1])
self.costs = [ ]
for n iters in range(self.max iter):
y pred = self.predict(X)
diff = y - y pred
self.w[0] += self.eta * np.sum(diff)
for j in range(X.shape[1]): // j ← [0, 1, ..., X.shape[1]]
delta = 0.0
for i in range(X.shape[0]): // i ← [0, 1, ..., X.shape[0]]
delta += self.eta * diff[i] * X[i][j]
self.w[j + 1] += delta
cost = np.sum(diff ** 2) / 2
self.costs.append(cost)

Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 13 / 40
’fit’ method (2)

def fit(self, X, y):


rgen = np.random.RandomState(self.random state)
self.w = rgen.normal(loc = 0.0, scale = 0.01, size = 1 + X.shape[1])
self.costs = [ ]
for n iters in range (self.max iter):
y pred = self.predict(X)
diff = y - y pred
self.w[0] += self.eta * np.sum(diff)
self.w[1:] += self.eta * np.dot(X.T, diff)
cost = np.sum(diff ** 2) / 2
self.costs.append(cost)

Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 14 / 40
Train Model

Gradient Descent
>> reg GD = LinearRegression GD(eta=0.001, max iter=20,
random state=42)
reg GD.fit(X, y)

Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 15 / 40
Visualize the trend in the cost values (Gradient Descent)

>> plt.plot(range(1, len(reg GD.costs) + 1), reg GD.costs)


plt.xlabel(’Epochs’)
plt.ylabel(’Cost’)
plt.title(’Gradient Descent’)
plt.show()

Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 16 / 40
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 17 / 40
Visualize on Data

>> plt.scatter(X, y, facecolor=’tab:blue’, edgecolor=’white’, s=70)


plt.plot(X, reg GD.predict(X), color=’green’, lw=6, label=’Gradient
Descent’)
plt.xlabel(’X’)
plt.ylabel(’y’)
plt.legend()
plt.show()

Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 18 / 40
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 19 / 40
Weight values

>> w GD = reg GD.w


w GD
>> [-0.9794002, 63.18592509]

Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 20 / 40
Implement (package)

Stochastic Gradient Descent


from sklearn.linear model import SGDRegressor

Hyperparameters Parameters Methods


eta0 intercept fit(X, y)
max iter coef predict(X)
random state

Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 21 / 40
Implement (package) (cont.)

Normal Equation
from sklearn.linear model import LinearRegression

Parameters Methods
intercept fit(X, y)
coef predict(X)

Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 22 / 40
Differences

Gradient Descent
w := w + ∆w
∆w = η i (y (i) − ŷ (i) )x i
P

Stochastic Gradient Descent


w := w + ∆w
∆w = η(y (i) − ŷ (i) )x i

Normal Equation
w = (X T X )−1 X T y

Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 23 / 40
Practice (cont.)

Stochastic Gradient Descent


>> from sklearn.linear model import SGDRegressor
>> reg SGD = SGDRegressor(eta0=0.001, max iter=20,
random state=42, learning rate=’constant’)
reg SGD.fit(X, y)

Normal Equation
>> from sklearn.linear model import LinearRegression
>> reg NE = LinearRegression()
reg NE.fit(X, y)

Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 24 / 40
Weight Values Comparisons

Gradient Descent (ours)


>> w GD = reg GD.w
w GD
>> [-0.9794002, 63.18592509]

Stochastic Gradient Descent


>> w SGD = np.append(reg SGD.intercept , reg SGD.coef )
w SGD
>> [-1.02681553, 63.08630288]

Normal Equation
>> w NE = np.append(reg NE.intercept , reg NE.coef )
w NE
>> [-0.97941333, 63.18605572]
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 25 / 40
Visualize on Data (all)

>> plt.scatter(X, y, facecolor=’tab:blue’, edgecolor=’white’, s=70)


plt.plot(X, reg GD.predict(X), color=’green’, lw=6, label=’Gradient
Descent’)
plt.plot(X, reg SGD.predict(X), color=’black’, lw=4,
label=’Stochastic Gradient Descent’)
plt.plot(X, reg NE.predict(X), color=’orange’, lw=2, label=’Normal
Equation’)
plt.xlabel(’X’)
plt.ylabel(’y’)
plt.legend()
plt.show()

Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 26 / 40
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 27 / 40
Performance Evaluation

Mean Absolute Error (MAE)

1 X (i)
MAE (y , ŷ ) = |y − ŷ (i) | (6)
n
i

Mean Squared Error (MSE)

1 X (i)
MSE (y , ŷ ) = (y − ŷ (i) )2 (7)
n
i

R-Squared (R2)
P (i)
2 (y − ŷ (i) )2
R (y , ŷ ) = 1 − Pi (i) − y )2
(8)
i (y

Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 28 / 40
Performance Evaluation

>> from sklearn.metrics import mean absolute error as MAE


from sklearn.metrics import mean squared error as MSE
from sklearn.metrics import r2 score as R2

>> y pred GD = reg GD.predict(X)

>> y pred SGD = reg SGD.predict(X)

>> y pred NE = reg NE.predict(X)

Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 29 / 40
Performance Evaluation (cont.)

Mean Absolute Error


>> print(’MAE of GD:’, round(MAE(y, y pred GD), 6))
print(’MAE of SGD:’, round(MAE(y, y pred SGD), 6))
print(’MAE of NE:’, round(MAE(y, y pred NE), 6))

Mean Squared Error


>> print(’MSE of GD:’, round(MSE(y, y pred GD), 6))
print(’MSE of SGD:’, round(MSE(y, y pred SGD), 6))
print(’MSE of NE:’, round(MSE(y, y pred NE), 6))

R 2 score
>> print(’R2 of GD:’, round(R2(y, y pred GD), 6))
print(’R2 of SGD:’, round(R2(y, y pred SGD), 6))
print(’R2 of NE:’, round(R2(y, y pred NE), 6))

Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 30 / 40
Run Gradient Descent with lr = 0.005

Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 31 / 40
Polynominal Regression

Example
X = [258.0, 270.0, 294.0, 320.0, 342.0, 368.0, 396.0, 446.0, 480.0, 586.0]
y = [236.4, 234.4, 252.8, 298.6, 314.2, 342.2, 360.8, 368.0, 391.2, 390.8]

>> X = np.array([258.0, 270.0, 294.0, 320.0, 342.0, 368.0, 396.0, 446.0,


480.0, 586.0])[:, np.newaxis]
y = np.array([236.4, 234.4, 252.8, 298.6, 314.2, 342.2, 360.8, 368.0,
391.2, 390.8])

>> plt.scatter(X, y, label=’Training points’)


plt.xlabel(’X’)
plt.ylabel(’y’)
plt.legend()
plt.show()

Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 32 / 40
Visualize data

Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 33 / 40
Experiment with Linear Regression

>> from sklearn.linear model import LinearRegression


lr = LinearRegression()
lr.fit(X, y)

Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 34 / 40
Experiment with Linear Regression (cont.)

Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 35 / 40
Experiment with Polynominal Regression

Syntax
from sklearn.preprocessing import PolynomialFeatures

>> from sklearn.preprocessing import PolynomialFeatures


quadratic = PolynomialFeatures(degree=2)
X quad = quadratic.fit transform(X)
pr = LinearRegression()
pr.fit(X quad, y)

Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 36 / 40
Experiment with Polynominal Regression (cont.)

Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 37 / 40
>> X test = np.arange(250, 600, 10)[:, np.newaxis]

>> y pred linear = lr.predict(X test)


y pred quad = pr.predict(quadratic.fit transform(X test))

>> plt.scatter(X, y, label=’Training points’)


plt.xlabel(’X’)
plt.ylabel(’y’)
plt.plot(X test, y pred linear, label=’Linear fit’, c=’black’)
plt.plot(X test, y pred quad, label=’Quadratic fit’, c=’orange’)
plt.legend()
plt.show()

Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 38 / 40
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 39 / 40
Practice

Dataset: ’Boston Housing’ (housing.csv) (14 attributes: 13


independent variables + 1 target variable)

File: boston housing.iypnb

Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 40 / 40

You might also like