Regression Kann Ur 14

Regression Analysis
BY
DR. ISMAIL B
PROFESSOR
DEPARTMENT OF STATISTICS
MANGALORE UNIVERSITY
MANGALAGANGOTHRI
e-mail: prof.ismailb@gmail.com
1
Descriptive Statistics
Using the p-value to make

the decision
The p-value is a probability computed assuming the null
hypothesis is true, that the test criterion would take a value as
extreme or more extreme than that actually observed.
Since its a probability, it is a number between 0 and 1. The
closer the number is to 0 means the event is unlikely..
So if p-value is .small,. we can then reject the null hypothesis.
Using the p-value to make the

decision
How much small??? Smaller than level of significance
= .05 or .01. So Using the p-value to make the decision
If .01<p<0.05, significant (sig, S)
If 0 < p<.001, Highly sig. (HS)
Answer What is the relationship between the

variables?
Equation used
1 numerical dependent (response) variable
What is to be predicted: Y
1 or more numerical or categorical
independent (explanatory) variables: X
Different techniques are using for different levels of
measures.
Types of Regression Models

1 Explanatory
Variable
Regression
Models
2+ Explanatory
Variables
Multiple
Simple
Linear
NonLinear
Linear
NonLinear
10
Types of Regression Models

Linear
Log linear
Dependent
1 Explanatory
Variable
Regression
Models
2+ Explanatory
Variables
Multiple
Simple
Linear
NonLinear
Linear
NonLinear
11
Linear Equations
Y
Y = bX + a
b = Slope
Change
in Y
Change in X
a = Y-intercept
Simple Linear Regression model is given by

Y=a+bX+e
12
Simple Linear Regression Model

Relationship Between Variables Is a Linear Function
The Straight Line that Best Fit the Data
Random
Error
Y intercept (Constant term)
Yi 0 1 X i i
Dependent
(Response)
Variable
Slope
Independent
(Explanatory)
Variable
13
Linear Regression Model

Y
Yi 0 1X i i
Observed
Value
i = Random Error
YX
0 1X i
(E(Y/X))
Observed Value
14
Assumption s
1. E( ) 0, disturbanc e have zero mean.
2. V( i ) 2 , i 1,2,....n
i.e., distubance have constant v ariance.
3. E i j 0,
for i j
i.e., disturbanc e are un correlated .

4. The explanator y variable X is non - stochastic .
i.e., fixed in repeated samples and
hence not correlated with the disturbanc es.
n
5. x 2t / n 0 and has a finite limit as n .

t 1
This assumption s states that we have atleast tw o distinct v alues for X.

15
The Sum of Squares

Y
SSE =(Y - Yi )2
SST = (Yi - Y)2
SSR = (Yi - Y)2
Xi
_
Y
X
16
BLUE : Least square estimator of is

then
x y
x
i
2
i
Y - X
we can write SLR model for all the observatio ns as

Y X
y1

y2
Y .

.
yn

1
X .
.
1
(X' X) -1 X' Y,
X1
X2
X n
V( ) (X' X) -1 2
17
V( ) / x i2
2
i 1
Estimation of 2
2 e 2 /( n 2)
i
e i Yi OLS OLS X i
Re sidual,
Testing H 0 : 0
2
s
t obs OLS
S.E( ) (s 2
x

i 1
2
i
2
i
OLS
S.E
If t obs t , n - 2 reject H 0 : 0 at % significan ce level.

2
(1 - )% ( 95%) confidence interval for is OLS t

2
,n 2
se ( ).
18
A measure of fit :
ei Yi Y1 ,
0,
Yi Y Y
as long as there is a constant in the regression .

n
i 1
i 1
R 2 y i2 / y i2 1
2
e
i
2
y
i
(i) R 2 squared correlatio n between Y & Y

(ii) R 2 simple squared correlatio n between Y and X.
If the intercept is not present th us uncentered R 2 is used as measure of fit.
The uncentered R 2 is
R2 1
2
e
i
Yi
2
Y
i
2
Y
i
Centered R 2 is
n
R 1 e
2
i 1
2
i
2
i
i 1
19
Prediction :
Y0 0 X 0 0
BLUP of E(Y 0 ) is
X
Y
0
0
0
Using the Gauss - Markav result.

2
X
2 1 0
VY
0
2
n
x
i
one can construct 95% confidence intervals to these
prediction s for every valu e of X 0 , given by

1

2
2
X
s 1 1 0
t

Y
0
0.025, n - 2
2

n
x
i

t 0.025, n - 2 represents 2.5% critical value obtained for t - distributi on with n - 2 d.f.
20
Example :
Annual consumptio n of 10 households each selected randomly
from a group of households with a fixed personal disposable income.
Both income and expenditur e measured in 1000 Rs.
Solution :
Yi X i i
0.8095
- estimated marginal propensity to consume.
- This is the extra consumptio n brought about by an extra Rs of disposable income.
Y - X 6.5 - (0.8095)(7 .5) 0.4286.

This is the estimated consumptio n at zero personal disposable income.
The fitted values from these regression , true values and residuals are shown in the figure.
21
V( )
s2
2
i
0.005941 , s 2 0.311905
SE( ) 0.077078
and estimated variances of is

1
X
V( ) s
2
n
x
SE( ) 0.60446
2
0.365374
Test statistics to test H 0 : 0 is

t0
10 .50
SE
p - value P t 8 10 .5 0.0001 , Reject H 0 . Hence X is highly significan t.

H 0 : 0, t 0 0.709 which is not significan t since p - value 0.498.
Therefore we do not reject H 0 .
22
R r
2
x y / x y 0.9324 .
1 - e y 0.9324 .
2
2
i
2
i
2
i
2
i
This means that personal dosposable income

explains 93.24% of the variation in consumptio n
23
The Sum of Squares

SST = Total Sum of Squares
measures the_variation of the Yi values around their mean Y
SSR = Regression Sum of Squares

explained variation attributable to the relationship between X and Y
SSE = Error Sum of Squares
variation attributable to factors other than the relationship between X

and Y
24
The Coefficient of Determination

r2 =
SSR
regression sum of squares
=
SST
total sum of squares
Measures the proportion of variation in

the dependent variable explained by the
regression line
25
Simple Linear Regression
26
Simple Linear Regression
27
28
Y = a0 + a1 X 1 + a2 X 2 + ... + an X n + e
e ~ N (0, s 2 )
Y: Response variable
X: Explanatory variable
e : Error
29
Errors are independent (no auto correlation)

Errors are normally distributed
Errors have zero mean and constant variance
No multi- collinearity
Regressors are not random variables (fixed for repeated measurements)
30
Multiple Regression
31
Regression Diagnostic asks 3 questions:
Are the assumptions of multiple regression complied
with?
Is the model adequate?
Is there anything unusual about any data points?
32
Plot the ACF of residuals
60
100
Residuals Versus the Fitted Values

(response is Crimrate)
50
40
Residual
30
20
East
West
North
50
10
0
-10
-20
-30
-40
1st
Qtr
50
3rd
Qtr
100
150
200
Fitted Value
Durbin Watson statistic (Normal value 0-4).

Remedy?
33
Plot residual versus fitted
Remedy?
34
Auto correlated Regression
35
Residual plot showing

Autocorrelation
36
Check by means of correlation matrix

Variance Inflation. Large changes in regression
coefficients when variables are added or deleted.
Variance inflation factor (VIF)>4 indicate multi
collinearity
VIF=1/(1-R^2)
Durbin Watson statistic is another check for
collinearity. (Normal value 0-4).
Remedy?
37
Logistic
Regression
Logistic regression is a form regression used when the dependent variable is dichotomy
(binary) and independent variable is of any type
Continuous variable are not used as dependent variable.
Logistic regression does not assume linearity of relationship between dependent and
independent variables
Does not assume normality and homoscedasticity
It assumes that observations be independent and that independent variables are linearly
related the logit of the dependent.
The scatter plot of outcome variable (Y) vs. independent variables shows all points fall on
one of the two parallel lines representing Y=0 and Y=1.
This scatter plot does provide clear picture of linear relationship.
In linear regression the quantity E(Y/X) can take any value in range
( , )
where in logistic regression E(Y/X) lies between (0,1)
38
Let (x)=E(Y/X). The specific form of ( x)

we use logistic regression model as
( x ) exp( 0 1 x ) (1 exp( 0 1 x ))
The logit transformation of ( x) given by
g(x) = ln( ( x ) (1 ( x ))
= 0 1 x
The logit, g(x) is linear in parameter, continuous and
may range (-, ) depending on range of x. we may
express value of the outcome variable given x as
y= (x)
39
Binary Logistic Regression
40
41
42
Thanks !!!
43

Regression Kann Ur 14

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Regression Kann Ur 14

Uploaded by

Copyright:

Available Formats

Regression Analysis

Using the p-value to make

Using the p-value to make the

If 0 < p<.001, Highly sig. (HS)

Answer What is the relationship between the

Types of Regression Models

Types of Regression Models

Simple Linear Regression model is given by

Simple Linear Regression Model

Y intercept (Constant term)

Linear Regression Model

i.e., disturbanc e are un correlated .

5. x 2t / n 0 and has a finite limit as n .

This assumption s states that we have atleast tw o distinct v alues for X.

The Sum of Squares

SST = (Yi - Y)2

SSR = (Yi - Y)2

BLUE : Least square estimator of is

we can write SLR model for all the observatio ns as

If t obs t , n - 2 reject H 0 : 0 at % significan ce level.

(1 - )% ( 95%) confidence interval for is OLS t

as long as there is a constant in the regression .

(i) R 2 squared correlatio n between Y & Y

Using the Gauss - Markav result.

one can construct 95% confidence intervals to these

prediction s for every valu e of X 0 , given by

- estimated marginal propensity to consume.

- This is the extra consumptio n brought about by an extra Rs of disposable income.

Y - X 6.5 - (0.8095)(7 .5) 0.4286.

and estimated variances of is

Test statistics to test H 0 : 0 is

p - value P t 8 10 .5 0.0001 , Reject H 0 . Hence X is highly significan t.

This means that personal dosposable income

The Sum of Squares

SSR = Regression Sum of Squares

SSE = Error Sum of Squares

variation attributable to factors other than the relationship between X

The Coefficient of Determination

Measures the proportion of variation in

Simple Linear Regression

Simple Linear Regression

Errors are independent (no auto correlation)

Regression Diagnostic asks 3 questions:

Are the assumptions of multiple regression complied

Is there anything unusual about any data points?

Plot the ACF of residuals

Residuals Versus the Fitted Values

Durbin Watson statistic (Normal value 0-4).

Plot residual versus fitted

Auto correlated Regression

Residual plot showing

Check by means of correlation matrix

Let (x)=E(Y/X). The specific form of ( x)

Binary Logistic Regression

Binary Logistic Regression

Binary Logistic Regression

You might also like