Regression Models: by Mayuri Bhandari

Regression Models
By Mayuri Bhandari
1
Regression
Regression is a method of modeling a target value
based on independent predictors.
This method is mostly used for forecasting and
finding out cause and effect relationship between
variables.
Regression techniques mostly differ based on the
number of independent variables and the type of
relationship between the independent and dependent
variables.
Continuous data
2
Regression
The variable that the equation in your linear
regression model is predicting is called
the dependent variable. We call that one y.
The variables that are being used to predict the
dependent variable are called the independent
variables. We call them X.
The equation that describes how y is related to X is
called the regression model
3
Regression Models in a frame
Linear Non-linear
Regression Regression
Model Model
Decision
Simple
Tree
Support
Multiple
Vector
Random
Polynomial
Forest
4
Linear Regression
Models
5
Simple Linear Regression
 Simple linear regression is used to estimate the relationship
between two quantitative variables.
You can use simple linear regression when you want to know:
1. How strong the relationship is between two variables (e.g. the
relationship between rainfall and soil erosion).
2. The value of the dependent variable at a certain value of the
independent variable (e.g. the amount of soil erosion at a certain
level of rainfall).
 The relationship between the independent and dependent
variable is linear: the line of best fit through the data points
is a straight line
6

7
30k
8
Ordinary Least Squares
9
Practical Activity
10
Multiple Linear Regression
11
Multiple linear regression (MLR), is a statistical technique
that uses several explanatory variables to predict the
outcome of a response variable.
The goal of multiple linear regression (MLR) is to model
the linear relationship between the independent variables
and response dependent variable.
12
Dummy Variables
13
Dummy Variables
When one variable predicts another, it’s called multicollinearity.
14
What is the P-value?
0.5 H0 : Coin is Fair
Ha : Coin is not Fair
0.25
0.12
0.06
0.03
0.01
15
What is the P-value?
It gives a value to the weirdness of your sample.
If you have a large P-value, then you probably won’t
change your mind about the null hypothesis.
A large value means that it wouldn’t be at all
surprising to get a sample like yours if the hypothesis
is true.
As the P-value gets smaller, you should probably start
to ask yourself some questions.
You might want to change your mind and maybe even
reject the hypothesis.
16
There are essentially five methods of building a
multiple linear regression model.
1. Chuck Everything In and Hope for the Best
2. Backward Elimination
3. Forward Selection
4. Bidirectional Elimination
5. Score Comparison
17
Practical Activity
18
Polynomial Regression
Polynomial regression is a special case of linear
regression where we fit a polynomial equation on the
data with a curvilinear relationship between the target
variable and the independent variables.
If your data points clearly will not fit a linear
regression (a straight line through all data points), it
might be ideal for polynomial regression.
Polynomial regression, like linear regression, uses the
relationship between the variables x and y to find the
best way to draw a line through the data points.
19
20
Underfitting
Y = b0 + b1*X
Y = b0 + b1X +
b2X2 + …+bnXn
21
Why Linear?
This is still considered to be linear model as the
coefficients/weights associated with the features are
still linear.
x² is only a feature.
However the curve that we are fitting is quadratic in
nature.
22
23
Practical Activity
24
Non-Linear
Regression Models
25
Support Vector Regression (SVR)
Support Vector Machines (SVMs) are well known in
classification problems.
SVR is a bit different from SVM.
As the name suggest the SVR is an regression
algorithm , so we can use SVR for working with
continuous Values instead of Classification which is
SVM.
26
Support Vector Regression (SVR)
27
Terminology
1. :
2. Boundary line: In SVM there are two lines other
than Hyper Plane which creates a margin . The
support vectors can be on the Boundary lines or
outside it. This boundary line separates the two
classes. In SVR the concept is same.
3. Support vectors: This are the data points which are
closest to the boundary. The distance of the points is
minimum or least.
4. Kernel: The function used to map a lower
dimensional data into a higher dimensional data.
28
Terminology
• In SVM this is basically the separation line between the data classes.
Hyperplane • Although in SVR we are going to define it as the line that will help us
predict the continuous value or target value
• In SVR like SVM there are two lines other than Hyper Plane which
creates a margin .
Boundary line • The support vectors can be on the Boundary lines or outside it.
• This boundary line separates the two classes
Support • This are the data points which are closest to the boundary.
• The distance of the points is minimum or least.
vectors
Kernel • The function used to map a lower dimensional data into a higher
dimensional data.
29
Why SVR?
In simple regression we try to minimize the error rate.
While in SVR we try to fit the error within a certain
threshold.
Our objective when we are moving on with SVR is to
basically consider the points that are within the
boundary line.
Our best fit line is the line hyperplane that has
maximum number of points.
30
SLR Vs SVR
31
SVR
32
SVR
Assuming that the equation of the hyperplane is as
follows:
Y = wx+b
Then the equations of decision boundary become:
wx+b= +a
wx+b= -a
Thus, any hyperplane that satisfies our SVR should
satisfy:
-a < Y- wx+b < +a
33
SVR
Our main aim here is to decide a decision boundary at
‘a’ distance from the original hyperplane such that
data points closest to the hyperplane or the support
vectors are within that boundary line.
Hence, we are going to take only those points that are
within the decision boundary and have the least error
rate, or are within the Margin of Tolerance.
This gives us a better fitting model.
34
Advantages
1. SVM works really well with high dimensional data.
2. If your data is in higher dimensions, it is wise to use
SVR.
3. For data with a clear margin of separations, SVM
works relatively well.
4. It is robust to outliers.
5. It has excellent generalization capability, with high
prediction accuracy.
35
Disadvantages
It is a bad option when the data has no clear margin of
separation i.e. the target class contains overlapping
data points.
It does not work well with large data sets.
36
Practical Activity
37
Decision Tree Regression
Decision tree builds regression or classification models in the form
of a tree structure.
It breaks down a dataset into smaller and smaller subsets while at
the same time an associated decision tree is incrementally
developed.
The final result is a tree with decision nodes and leaf nodes.
A decision node has two or more branches, each representing
values for the attribute tested.
Leaf node represents a decision on the numerical target.
The topmost decision node in a tree which corresponds to the best
predictor called root node.
Decision trees can handle both categorical and numerical data.
38
Decision trees are constructed via an algorithmic approach that
identifies ways to split a data set based on different conditions.
It is one of the most widely used and practical methods for
supervised learning.
Decision Trees can be used for both classification and
regression tasks.
Tree models where the target variable can take a discrete set of
values are called classification trees.
Decision trees where the target variable can take continuous
values (typically real numbers) are called regression trees.
Classification And Regression Tree (CART) is general term for
this.
39
40
41
Decision Tree
Regression
42
43
Very Simple Very difficult
44
45
A decision tree is built top-down from a root node and
involves partitioning the data into subsets that contain
instances with similar values (homogenous).
We use standard deviation to calculate the
homogeneity of a numerical sample.
If the numerical sample is completely homogeneous
its standard deviation is zero.
47
Advantages
1. Compared to other algorithms decision trees requires
less effort for data preparation during pre-processing.
2. A decision tree does not require normalization of data.
3. A decision tree does not require scaling of data as well.
48
Disadvantages
A small change in the data can cause a large change in
the structure of the decision tree causing instability.
For a Decision tree sometimes calculation can go far
more complex compared to other algorithms.
Decision tree often involves higher time to train the
model.
49
Decision Tree Regression: Implementation
from sklearn.tree import DecisionTreeRegressor
regressor = DecisionTreeRegressor(random_state = 0)
regressor.fit(X, y)
regressor.predict([[6.5]])
50
Random Forest Regression
51
Ensemble Learning
An Ensemble method is a technique that combines the predictions
from multiple machine learning algorithms together to make
more accurate predictions than any individual model.
A model comprised of many models is called an Ensemble model.
52
Ensemble Learning
Types of Ensemble Learning:
1. Boosting.
2. Bootstrap Aggregation (Bagging).
53
Boosting
Boosting refers to a group of algorithms that utilize
weighted averages to make weak learners into stronger
learners.
Boosting is all about “teamwork”.
Each model that runs, dictates what features the next
model will focus on.
In boosting as the name suggests, one is learning
from other which in turn boosts the learning.
54
Boosting
55
Bootstrap Aggregation (Bagging)
Bootstrap refers to random sampling with replacement.
Bootstrap allows us to better understand the bias and the
variance with the dataset.
Bootstrap involves random sampling of small subset of data
from the dataset.
It is a general procedure that can be used to reduce the
variance for those algorithm that have high variance,
typically decision trees.
Bagging makes each model run independently and then
aggregates the outputs at the end without preference to any
model.
56
Bootstrap Aggregation (Bagging)
57
58
Random forest is a Supervised Learning algorithm which
uses ensemble learning method for classification and
regression.
Random forest is a bagging technique and not a boosting
technique.
The trees in random forests are run in parallel.
There is no interaction between these trees while building
the trees.
It operates by constructing a multitude of decision trees at
training time and outputting the class that is the mean
prediction (regression) of the individual trees.
59
Steps : Random Forest Regression
1. Pick at random k data points from the training set.
2. Build a decision tree associated to these k data
points.
3. Choose the number N of trees you want to build and
repeat steps 1 and 2.
4. For a new data point, make each one of your N-tree
trees predict the value of y for the data point in
question and assign the new data point to the
average across all of the predicted y values.
60
Advantages
It is one of the most accurate learning algorithms available. For
many data sets, it produces a highly accurate classifier.
It runs efficiently on large databases.
It can handle thousands of input variables without variable
deletion.
It gives estimates of what variables that are important in the
classification.
It generates an internal unbiased estimate of the generalization
error as the forest building progresses.
It has an effective method for estimating missing data and
maintains accuracy when a large proportion of the data are
missing.
61
Disadvantages
Random forests have been observed to overfit for some
datasets with noisy classification/regression tasks.
For data including categorical variables with different
number of levels, random forests are biased in favor of
those attributes with more levels.
Therefore, the variable importance scores from
random forest are not reliable for this type of data.
62
Implementation
from sklearn.ensemble import RandomForestRegressor
regressor = RandomForestRegressor(n_estimators = 10, random_state = 0)
regressor.fit(X, y)
regressor.predict([[6.5]])
63
References
Machine Learning course material by Andrew Ng
https://towardsdatascience.com/
https://www.edureka.co/
www.analyticsvidhya.com
https://www.saedsayad.com/support_vector_machine
_reg.htm
http://scikit-learn.org/stable/modules/generated/skle
arn.svm.SVR.html
64

Regression Models: by Mayuri Bhandari

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Regression Models: by Mayuri Bhandari

Uploaded by

Copyright:

Available Formats

Regression Models

When one variable predicts another, it’s called multicollinearity.

You might also like