Linear Regression

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 97

Lecture No:

Regression
19/09/2014 1

Presented by: Dr Mustansar Ali Ghazanfar
Mustansar.ali@uettaxila.edu.pk
UET Taxila, Pakistan

Motivational Picture!
Study Tour
Agenda

Regression Vs. Classification
Regression concepts & Simple Linear Regression
Least Square Methods
Application for predictions (Examples)
Pros and Cons
Recap
19/09/2014
Learning
Supervised
Learning
Unsupervised
Learning
Machine Learning
Today!
19/09/2014
Remember the Problem
X
y
Features (n)
S
a
m
p
l
e
s

(
m
)

S
a
m
p
l
e
s

(
m
)

1
Training Set
Remember the Problem
X
y
Features (n)
1
X
(i)
y
(i
)
i i
Training Set
Supervised Learning
X
y
Features (n)
S
a
m
p
l
e
s

(
m
)

S
a
m
p
l
e
s

(
m
)

1
Training Set
Unsupervised Learning
X
Features (n)
S
a
m
p
l
e
s

(
m
)

Training Set

Bigger Picture
19/09/2014
Code Web:
Improving Life for Future Tas
John, Andy, Chris, Leo
Linear Regression
Model
Code Web:
Improving Life for Future Tas
John, Andy, Chris, Leo
Model?
19/09/2014
In regression analysis we analyze the relationship
between two or more variables.

The relationship between two or more variables could be
linear or non linear.

How we could use available data to investigate such a
relationship?

If there exist a relationship, how could we use this
relationship to forecast future.
Regression Analysis
19/09/2014
Types of
Regression Models
Regression
Models
Linear
Non-
Linear
2+ Explanatory
Variables
Simple
Non-
Linear
Multiple
Linear
1 Explanatory
Variable
Y
Y = mX + b
b = Y-intercept
X
Change
in Y
Change in X
m = Slope
Linear Equations
1984-1994 T/Maker Co.
Linear relationship between two variables is stated as

y =
0
+
1
x

This is the general equation for a line

0
: Intersection with y axis

1
: The slope

x : The independent variable

y : The dependent variable
Simple Regression Analysis
19/09/2014
For example, advertising could be the independent variable and
sales to be the dependent variable.

We first analyze available data to develop a relationship between
sales and advertising.


Sales =
0
+
1
(Advertising)

After estimating
0
and
1
, then we use this relationship to forecast
sales given a specific level of advertising.

We can answer questions such a How much sales we will have if we
spent a specific amount on advertising?

Application Area, Example 1
19/09/2014
A nationwide chain Pizza restaurant.

The most successful branches are close to college campuses.

Question such as: Is quarterly sales is closely related to the size of
the student population?

That is : restaurants near campus with large population tend to have
more sales than those located near campus with small population.

X : student population
y : quarterly sales
Application Area, Example 2
19/09/2014
For the time being forget . The following equation describes how the mean
value of y is related to x.


0
+
1
x


0
is the intersection with y axis,
1
is the slope.

1
> 0

1
< 0

1
= 0


Simple Linear Regression
Model
19/09/2014
If we knew the values of
0
and
1
then given any campus population, we could plug it into
the equation and find the mean value of the sales.

E (y ) =
0
+
1
x

But we do not know the values for
0
and
1

We have to estimate them.

We estimate them using past data (training data).

We estimate
0
by b
0
and
1
by

b
1


Estimated Linear Regression
equations
19/09/2014
Simple Linear Equation

E (y ) =
0
+
1
x


Estimate of the Simple Linear Equation

y = b
0
+b
1
x


^
Estimated Linear Regression
equations
19/09/2014
The Estimating Process in Simple Linear Regression
Regression model
y =
0
+
1
x +
Regression equation
E (y ) =
0
+
1
x
Unknown

0
and
1

Training Data
X y
x
1
y
1
x
2
y
2

. .
. .
x
n
y
n

b
0
and b
1

provide estimates for

0
and
1



Estimated regression equation
y = b
0
+ b
1
x
Sample statistics
b
0
and b
1

^
19/09/2014

We collect a set of data from random stores of our Pizza restaurant example

Restaurant Student population Quarterly Sales
(1000s) ($1000s)
i x
i
y
i

1 2 58
2 6 105
3 8 88
4 8 118
5 12 117
6 16 137
7 20 157
8 20 169
9 22 149
10 26 202
Example: Pizza Restaurant
19/09/2014
Scatter Diagram
19/09/2014
Fitting the Model
19/09/2014
Question (for any data)
1. Plot of all (x
i
, y
i
) pairs
2. Suggests how well model will fit
0
20
40
60
0 20 40 60
x
y
19/09/2014
Volunteer
19/09/2014
Thinking Challenge
0
20
40
60
0 20 40 60
x
y
How would you draw a line through the points?
How do you determine which line fits best?
19/09/2014
The Least Square Solution
X
Y

19/09/2014
X
Y



19/09/2014
Fitting data to a linear model
1 i o i i
Y X
intercept slope
residuals
19/09/2014
Formula for the Least
Squares Estimates







n = sample size

y intercept :

0
y

1
x
Slope :

1

SS
xy
SS
xx

where SS
xy
x
i
x
( )
y
i
y
( )
SS
xx
x
i
x
( )
2

19/09/2014
Interpreting the Estimates of
0
and

1
in Simple Liner Regression
y-intercept:

represents the predicted value of y when x = 0
slope:

represents the increase (or decrease) in y for every
1-unit increase in x

0
19/09/2014
Least Squares Graphically

2
y
x

1

3

4
^
^
^
^
2 0 1 2 2


y x + +
0 1

i i
y x +
2 2 2 2 2
1 2 3 4
1

LS minimizes
n
i
i

+ + +
19/09/2014
Estimated Linear Regression Equation
We want to estimate the relationship between
Student population Mean value of sales
?
?
We may rely on own judgment, and draw a line to fit them.

Then we measure the intersection with y axis and that is b
0
, and the
slope is b
1

19/09/2014
Graphical - Judgmental Solution
b
1

b
0

1
Y = 55+6x
19/09/2014
The Least Square Method

We may implement the same approach in algebra by Minimize
Sum of the square of the difference between observed values and
values on the regression line.

The Least Square Method.
19/09/2014
Graphical - Judgmental Solution
19/09/2014
Graphical - Judgmental Solution
19/09/2014
The Least Square Method
y
i
x
i

y
i
y
1
x
1
b
0
+b
1
x
1
y
2
x
2
b
0
+b
1
x
2
y
3
x
3
b
0
+b
1
x
3
. . .
. . .
y
n
x
n
b
0
+b
1
x
n
2
i
n
1 i
i
) y (y Z Min

2
i 1 0
n
1 i
i
) x b b (y Z Min

19/09/2014
2
i 1 0
n
1 i
i
) x b b (y Z Min

We want to minimize this function with respect to b


0
and b
1.
This is a
classic optimization problem.

To find the minimum value we should get the derivative and set it
equal to zero.

Classical Minimization
19/09/2014
The Least Square Method
n 1 0 n n
3 1 0 3 3
2 1 0 2 2
1 1 0 1 1
i i i
x b b x y
. . .
. . .
x b b x y
x b b x y
x b b x y
y x y

Note : Our unknowns are b


0
and b
1 .
x
i
and y
i
are known. They are our data


2
i 1 0
n
1 i
i
) x b b (y Z

Find the derivative of Z with respect to b


0
and b
1
and set them
equal to zero 19/09/2014
Derivatives


n
1 i
2
i 1 0 i
) x b b y ( Z

n
1 i
i 1 0 i
0
0 ) x b b y )( 1 ( 2
b
Z

n
1 i
i 1 0 i i
1
0 ) x b b y )( x ( 2
b
Z
19/09/2014
Not need for exam
B
0
and b
1

n / ) x ( x
n / ) y x ( xy
b
2 2
1

x b y b
1 0

19/09/2014
Example

Restaurant i x
i
y
i



1 2 58
2 6 105
3 8 88
4 8 118
5 12 117
6 16 137
7 20 157
8 20 169
9 22 149
10 26 20


x
i
y
i


116
630
704
944
1404
2192
3140
3380
3278
5252
x
i
2

4
36
64
64
144
256
400
400
484
676
Total 140 1300 21040 2528
19/09/2014
b
1

n / ) x ( x
n / ) y x ( xy
b
2 2
1

10 / ) 140 ( 2528
10 / ) 1300 )( 140 ( 040 , 21
b
2
1

5
568
2840
b
1

19/09/2014
b
0

x b b y
1 0

130
10
1300
y
14
10
140
x
) 14 ( 5 b 130
0

60 b
0

19/09/2014
Estimated Regression Equation


x 5 60 y
Now we can predict.
For example, if one of restaurants of this Pizza Chain is close to a campus with 16,000
students.
We predict the mean of its quarterly sales is


dollars thousand 140 y
) 16 ( 5 60 y


19/09/2014
Another Example:
Price Prediction of a Toy
* Solve the problem and win this teddy bear!
19/09/2014
Example
You are a marketing analyst for a Toy. You
gather the following data:
Ad Expenditure (100$) Sales (Units)
1 1
2 1
3 2
4 2
5 4

Find the least squares line relating
sales and advertising.
19/09/2014
0
1
2
3
4
0 1 2 3 4 5
Sales vs. Advertising
Sales
Advertising
19/09/2014
n / ) x ( x
n / ) y x ( xy
b
2 2
1

x b y b
1 0

y b
0
+b
1
x
Question: Fit this equation to given data
x = Ad Expenditure (100$) y= Sales
(Units)
1 1
2 1
3 2
4 2
5 4

19/09/2014
Parameter Estimation Solution
( )( )
( )
1
1
1
1
2 2
1
2
1
15 10
37
5

.70
15
55
5
n
n
i
i
n
i
i
i i
i
n
i
n
i
i
i
x y
x y
n
x
x
n

y .1+.7x
( )( )
0 1

2 .70 3 .10 y x
19/09/2014
Coefficient Interpretation Solution
Slope (
1
)
Sales Volume (y) is expected to increase by $700
for each $100 increase in advertising (x), over the
sampled range of advertising expenditures from
$100 to $500
^
19/09/2014
0
1
2
3
4
0 1 2 3 4 5
Regression Line Fitted to the Data
Sales
Advertising

.1 .7 y x +
19/09/2014
Linear regression implements a statistical model that,
when relationships between the independent variables
and the dependent variable are almost linear, shows
optimal results.
Primary tool for process modeling because of its
effectiveness and completeness (and simplicity)
The theory associated with linear regression is well-
understood. Furthermore, most of the processes are
inherently linear or, over short ranges, any process can
be well-approximated by a linear models

Pros
19/09/2014
Sensitivity to Outliers: the method of least squares is
very sensitive to the presence of unusual data points in
the data used to fit a model. One or two outliers can
sometimes seriously skew the results of a least squares
analysis outliers.
Linear regression is often inappropriately used to model
non-linear relationships.
Linear regression is limited to predicting numeric output.
Cons
19/09/2014
Code Web:
Improving Life for Future Tas
John, Andy, Chris, Leo
Logistic Regression
Inspired by Nature
Based on input, either send a signal or dont
Neuron Input
N
e
u
r
o
n

O
u
t
p
u
t

Based on input, either send a signal or dont
Neuron Input
N
e
u
r
o
n

O
u
t
p
u
t

Based on input, either send a signal or dont
Neuron Input
N
e
u
r
o
n

O
u
t
p
u
t

Based on input, either send a signal or dont
Neuron Input
N
e
u
r
o
n

O
u
t
p
u
t

Based on input, either send a signal or dont
Neuron Input
N
e
u
r
o
n

O
u
t
p
u
t

Based on input, either send a signal or dont
Neuron Input
N
e
u
r
o
n

O
u
t
p
u
t

Based on input, either send a signal or dont
Neuron Input
N
e
u
r
o
n

O
u
t
p
u
t

Based on input, either send a signal or dont
Neuron Input
N
e
u
r
o
n

O
u
t
p
u
t

Based on input, either send a signal or dont
Neuron Input
N
e
u
r
o
n

O
u
t
p
u
t

Based on input, either send a signal or dont
Neuron Input
N
e
u
r
o
n

O
u
t
p
u
t

Logistic Function
Logistic Function
g(x)
1
1+e
x
Logistic Function
g(x)
1
1+e
x
x
g(x)
Logistic Function
x
g(x)
g(0)
1
1+e
0
0
Logistic Function
x
g(x)
g(0)
1
1+e
0
0.5
0
0.5
Logistic Function
x
g(x)
g(5)
1
1+e
5
?
0
0.5
5
Logistic Function
x
g(x)
g(5)
1
1+e
5
0.99
0
0.5
5
Logistic Function
x
g(x)
g( 5)
1
1+e
5
?
0
0.5
5
-5
Logistic Function
x
g(x)
g( 5)
1
1+e
5
0.01
0
0.5
5
-5
Logistic Function
x
g(x)
0
0.5
5
-5
g(x)
1
1+e
x
What are the inputs?
Hypothesis Function
x
1

Hypothesis Function
x
1

x
2

Hypothesis Function
x
1

x
2

x
3

Hypothesis Function
x
1

x
2

x
3

x
n

Hypothesis Function
x
1

x
2

x
3

x
n

asd
T
x
Hypothesis Function
x
1

x
2

x
3

x
n

asd
T
x
Hypothesis Function
x
1

x
2

x
3

x
n

asd
T
x
Hypothesis Function
x
1

x
2

x
3

x
n

asd
T
x
h(x)
1
1+e

T
x
Hypothesis Function
x
1

x
2

x
3

x
n

asd
T
x
h(x)
1
1+e

T
x
h(x)
1
1+e

T
x
Logistic Regression
Logistic Regression
How to find parameters
h x
(i)
( )
y
(i )

i1
m

x
(i)
Repeat:
random
Simple Linear Regression Model
y =
0
+
1
x

+

Simple Linear Regression Equation
E(y) =
0
+
1
x

Estimated Simple Linear Regression Equation
= b
0
+ b
1
x
Summary: The linear Regression
Model
19/09/2014
Least Squares Criterion
min S(y
i
-
i
)
2

where
y
i
= observed value of the dependent
variable
for the i th observation

i
= estimated value of the dependent
variable
for the i th observation

19/09/2014
Slope for the Estimated Regression Equation




y -Intercept for the Estimated Regression Equation



x
i
= value of independent variable for i th observation
y
i
= value of dependent variable for i th observation
x = mean value for independent variable
y = mean value for dependent variable
n = total number of observations

n / ) x ( x
n / ) y x ( y x
b
2
i
2
i
i i i i
1
_
_
x b y b
1 0

19/09/2014
h(x)
1
1+e

T
x
Logistic Regression
References & Further

1. http://faculty.ksu.edu.sa/73212/Publications/Forms/AllItems.aspx

19/09/2014

You might also like