Download as pdf or txt
Download as pdf or txt
You are on page 1of 32

INTRODUCTION TO

AI AND MACHINE
LEARNING
UNDERSTANDING • what is regression
• Linear Regression
REGRESSION • implementation issues
WHAT IS REGRESSION

Regression is method to find the relationship between


independent variables and a dependent variable.

Very well understood and mathematically defined

Used in supervised learning as labelled data are required to


create and train the model.

INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC


WHAT IS REGRESSION

Regression can be broadly classified into:


▪ Linear Regression
▪ The model created is linear (i.e. line) between input(s) variables and output variable
▪ Output (dependent ) variable is continuous, while input variable(s) can be either
categorical or continuous

▪ Logistic Regression
▪ The model created is usually sigmoidal (i.e. S-shape) between input(s) variables and
output variable
▪ Output variable is usually categorical
▪ Binary Logistic Regression – Only 2 possible outcome in the output (i.e. Success. / Failure)
▪ Multinomial Logistic Regression - > 2 possible outcomes in the output + no ordering
▪ Ordinal Logistic Regression - >2 possible outcomes in the output + order associated with output
INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC
WHAT IS REGRESSION
REGRESSION VS CLASSIFICATION
▪ Regression (method) is to study the relationship input vs output
BUT using the same method could also be used to classifier (i.e.
logistic regression)
▪ The objective (or problem) will require a suitable regression
method to be used.

▪ We will use Linear regression subsequently in this lesson as it


is not suitable as a classifier to reduce possible confusion
between the objective (of finding relationship) and the method
Source : https://kindsonthegenius.com/blog/what-is-the-difference-between-classification-and-regression/
INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC
HOW LINEAR REGRESSION WORKS
SINGLE DIMENSION LINEAR REGRESSION

▪ For example:
▪ X (independent variable)
▪ Y (dependent variable)

▪ How to draw a line to get the


best fit ?

INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC


HOW LINEAR REGRESSION WORKS
FINDING THE BEST LINE

▪ To get the “best fit” for the training samples, the line should
minimize error between the observed y and predicted ŷ value
in the training data.

INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC


HOW LINEAR REGRESSION WORKS
FINDING THE BEST LINE

INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC


HOW LINEAR REGRESSION WORKS
HOW GOOD IS THE FIT

Linear regression uses 2 metrics to check how “good” is the model


▪ Root Mean squared error (RMSE)
▪ How bad / erroneous the model’s predictions are when compared with actual
observed values
▪ High RMSE – 󰗩 , Low RMSE – 👍

▪ Coefficient of termination (R2)


▪ Measure the strength of the relationship between the response and the predictor
variables in the model
▪ If R2 = 0.65 , that means predictor variables explain 65% of the variance in the
response variable
INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC Source : https://medium.com/wwblog/evaluating-regression-models-using-rmse-and-r%C2%B2-42f77400efee
HOW LINEAR REGRESSION WORKS
HOW GOOD IS THE FIT

▪ RMSE ▪ R2
Range is dependent on the Range 0 - 1
response (output) variable General formula (there is
another formula for entire
population)

INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC Source : https://medium.com/wwblog/evaluating-regression-models-using-rmse-and-r%C2%B2-42f77400efee


HOW LINEAR REGRESSION WORKS
HOW GOOD IS THE FIT

These are the possible outcomes:


▪ Low RMSE, high R² (the best case)
▪ Low RMSE, low R²
▪ High RMSE, high R²
▪ High RMSE, low R² (the worst case)

INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC Source : https://medium.com/wwblog/evaluating-regression-models-using-rmse-and-r%C2%B2-42f77400efee


HOW LINEAR REGRESSION WORKS
HOW GOOD IS THE FIT

Low RMSE, high R² (best case)

R2 = 0.98
Knowing X will help in predicting Y

RMSE = 5.1 (average of 10 to -10 )


RMSE is in the order of 102
(5 vs 100-300) hence small

INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC Source : https://medium.com/wwblog/evaluating-regression-models-using-rmse-and-r%C2%B2-42f77400efee


HOW LINEAR REGRESSION WORKS
HOW GOOD IS THE FIT

Low RMSE, low R²


R2 = 0
Knowing X is useless to predict Y (all
300)

RMSE = 5 (average of 10 to -10 )


RMSE is in the order of 102
(5 vs 300) hence small

INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC Source : https://medium.com/wwblog/evaluating-regression-models-using-rmse-and-r%C2%B2-42f77400efee


HOW LINEAR REGRESSION WORKS
HOW GOOD IS THE FIT

▪ High RMSE, high R²


▪ With high R2,there are value in the prediction
▪ But high error (RMSE) resulted in inaccurate predicated value (output)

▪ High RMSE, low R² (the worst case)


▪ The prediction is groundless
▪ The error is high

INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC


ACTIVITY
LINEAR REGRESSION CALCULATOR

X Y
1 2
2 3
3 6
4 7
5 9

INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC


HOW LINEAR REGRESSION WORKS
MULTI DIMENSION LINEAR REGRESSION

In real-life, multiple input variables are often


used.
For example:
X1 = Distance from MRT station
X2 = House age
Y = Price ($ psf)
Input / output stored as matrix
Equation need multiple coefficients

INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC


HOW LINEAR REGRESSION WORKS
MULTI DIMENSION LINEAR REGRESSION

▪ Each x variable will have their own w (another


matrix)
▪ The E can be calculated
▪ Using partial differential with respect to w = 0
▪ Each coefficient for the respective x variable
can be optimized.

INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC


HOW LINEAR REGRESSION WORKS
MULTI DIMENSION LINEAR REGRESSION

▪ Once we have computed coefficients


and verified the goodness (RMSE
and R2 ) the modelling is done.

▪ Typically, it would be useful to plot


the data into a graph with the
prediction plane to inspect the model

Source: https://towardsdatascience.com/linear-regression-made-easy-how-does-it-work-and-
INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC how-to-use-it-in-python-be0799d2f159
HOW LINEAR REGRESSION WORKS
MULTI DIMENSION LINEAR REGRESSION

▪ Additional work analysis


▪ Create and use derived features (columns) (i.e. age instead of
year-of-birth)
▪ Derived features (x0 .. x3, in this case) could also be powers of basic
features (x) to becomes a polynomial

▪ Multi-dimensional linear regression could derive polynomial curves,


planes and even hyper-planes

INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC


HOW LINEAR REGRESSION WORKS
GRADIENT DESCENT

▪ Gradient descent is an optimization technique use to find the


minimum of arbitrarily complex error functions.
▪ It is easy to understand and widely used in ML techniques.
▪ With the error function, it could find the weights that give the
lowest errors.
▪ Differentiation cannot be used to compute all error functions
hence gradient descent is used.

INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC


HOW LINEAR REGRESSION WORKS
GRADIENT DESCENT

▪ Steps
▪ pick random set of weights
▪ iteratively adjust weights in the direction of the gradient of the error
▪ when gradient approaches 0 🡪 minimum error , convergence

INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC Source : https://towardsdatascience.com/how-to-do-linear-regression-using-gradient-descent-79a2ff4ace05


HOW LINEAR REGRESSION WORKS
GRADIENT DESCENT CONSIDERATIONS

▪ The learning rate is also called hyper-parameters

▪ Suitable value for hyper-parameters


▪ Too small – Convergence take a very very long time
▪ Too large – Convergence may never happen as iterations bounce
from different sides of the minima
▪ Many algorithms can automatically determine if gradient
descent has converged (e.g/ orange3)

INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC


HOW LINEAR REGRESSION WORKS
GRADIENT DESCENT CONSIDERATIONS

▪ If convergence need to be manually detected use


▪ iterations number (x axis)
vs
▪ cost-function (y axis)

▪ Typically, the graph will be as follows

INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC


HOW LINEAR REGRESSION WORKS
GRADIENT DESCENT CONSIDERATIONS

▪ Batch gradient descent


▪ Take average of the gradients of all the training examples (dataset)
▪ Use the mean gradient to update the hyper-parameters

▪ Stochastic gradient decent (SGD)


▪ Take 1 training example
▪ Calculate gradient and update hyper-parameters
▪ Repeat with next example (for all)
▪ Cost function will decrease but will fluctuate
▪ May keep dancing and never reach minima

▪ Mini-batch gradient descent


▪ Like SGD except in batch of 50-256 examples
INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC
IMPLEMENTING LINEAR REGRESSION

▪ Generalization
▪ Over-fitting
▪ Regularization

INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC


IMPLEMENTING LINEAR REGRESSION
OVERFITTING AND GENERALIZATION

▪ Assumption
▪ Limited dataset
▪ System could ”remember” all the data
points
▪ Over-fitting (green line)
▪ Perfect prediction for known training data
▪ Likely not as good for unseen data
▪ Generalization (black line)
▪ More suitable for new unseen data

Source : https://en.wikipedia.org/wiki/Overfitting
INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC
IMPLEMENTING LINEAR REGRESSION
OVERFITTING AND GENERALIZATION

▪ “Over-fitting” will increase generalization error

▪ To reduce generalization error, we should


▪ Collect as much sample as possible
▪ Use random subset of data for training
▪ Do not use training set with test set
▪ Experiment with adding higher degrees of polynomials (x1, x2, x3)

INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC


IMPLEMENTING LINEAR REGRESSION
L1 REGULARIZATION (LASSO)

▪ L1 regularization also know as Lasso regression (Least


Absolute Shrinkage and Selection Operator)

▪ Shrinks the less important feature’s coefficient to 0


▪ Effectively remove the low impact feature(s)

▪ L1 regularization encourages a few coefficients to be


non-zero, many are zero.

INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC


IMPLEMENTING LINEAR REGRESSION
L2 REGULARIZATION (RIDGE)

▪ L2 regularization also known as Ridge regression

▪ Reduce the complexity by prevent overfitting of the outliners

▪ Add an additional term in the cost function that has the effect
of penalizing large weights and thereby minimizing this skew.

INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC


IMPLEMENTING LINEAR REGRESSION
CATEGORICAL INPUTS

▪ For input that are categories (e.g. gender) rather than


number, we will represent the category values as one-hot
encoding for use in the linear regression equations (if there is
no built-in support from data analysis tool).

▪ Examples
▪ Male = x1
▪ Female = x2
▪ Other features = x3 …
INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC
REGRESSION APPLICATIONS

Regression is usually for


1. Forecasting
2. Capital Asset Pricing Model (CAPM)
▪ Establishes the link between an asset's projected return and the
related market risk premium, use linear regression model
3. Identifying problems
▪ Based on regression and other statistical analysis on in-house data
4. Comparing with competition
INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC
END OF LESSON 3

You might also like