Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

Univariate analysis (one variable): one sample t test, z test for the proportion

Bivariate analysis: compare two population means, proportions

Regression analysis: consider the dependence of ONE variable (dependent variable)

On one or some other variable(s) (independent variables).

One dependent and one independent variables: single (simple) regression model.

More than 1 independent variables: Multiple regression model.

Example: Consider the dependence of score (GPA) on English Score (Eng).

Theoretically, positive relationship, higher Eng, higher GPA.

Now, how much the GPA changes when Eng increases 1 point  use a model

Mathematical model: linear model: GPA = Bo + B1* Eng (line)

Bo: intercept: When Eng is 0, the average GPA is Bo (it does make sense in this case because

No value of Eng=0

B1: rate of change of GPA by Eng: if Eng increases 1 unit, the average of GPA increases

B1 unit.

Regression model (econometric model)

GPAi = Bo + B1* Engi +ui u: random error

Model (the variable Eng) affects the GPA

To estimate the value of B0 and B1, we need the data. Then we use the method of

Ordinary Least Square (OLS) to estimate these coefficients

Assume the estimated model is: Y^ = b0 + b1X

Coefficients
Intercep
t 2.846428571
Eng 0.092857143

Estimated model is: GPA^ = 2.85 + 0.093*Eng

The slope = 0.093: one more point of English score makes the average of GPA increases

By 0.093 point
 The coefficient of Determination, R2: measures the goodness of fit of model

R2 = RSS/TSS

0 =< R2 =< 1

R Square 0.144691781
14.47% of variation of GPA can be explained by English score (or by the model)

* Test for the significance of the coefficient (independent variable): t test.

Hypothesis: Ho: B1=0; H1: B1 ≠ 0

use t statistic: t = b1 /se(b1), Reject Ho if t > tn-kα/2 or t < - tn-kα/2

in which k is the number of variables of model (including dependent variable)

If we use software, the t value has the p-value, then we can use p-value for this

test.

Example:

t Stat P-value
5.04319 0.00395
3 6
0.91969 0.39992
9 6

Test for the significance of English score, t = 0.919, p-value = 0.399 >α=0.05

 Do not reject Ho  B1 =0  English score is not significant.


 F test for model significance
Ho: B0 = B1 =0 (model is not significant); H1: At least 1 coefficient is not 0

F has p-value =0.399 >0.05  model is not significant

Significance
F F
0.8458458 0.39992636
5 3
 Prediction: given a value of independent variable, Eng = 7.5
Predict for the new value of GPA. Plug in the value of Eng into the mode.
GPAo^ = 2.85 +0.093*7.5= 3.55

Part 2: Multiple Regression model:

Assume we have 1 dependent variable and k-1 independent variables

Model: Yi = B0 + B1*X1 + B2* X2 +…..+Bk-1* Xk-1 +ui


This is the model with k parameters or k variables

To estimate, we still use the OLS method. Estimated model is:

Y^ = b0 +b1*X1 +….+bk-1*Xk-1

Interpretation for bj: (Xj): other independent variables fixed, if Xj increases 1 unit,

The average of Y increases (decreases) bj unit

 t test for significance of Bj: t= bj/se(bj), reject Ho if t > tn-kα/2 or t < - tn-kα/2

In practice we use p-value for these tests.


 F test for model significance:
Ho: B1 = B2 =…=Bk= 0 : Use p-value
 Prediction, given Xj = xj0
 R2: % of variation of Y that can be explained by the model.
 Adjusted R square, Rbar2: this value can be used to replacing for R2.
R2 always increases when we add more independent variable into the model.
But Rbar2 can increases, decreases, or even negative.
How to use Rbar2: WE use it to decide if we should add 1 more independent
Variable into the model: when adding more variable, if Rbar2 increases, we
should add new variable. Otherwise, we should not.

Adjusted Multiple Coefficient of Determination


2 2 n−1
Ra =1−(1−R )
n−k
Example:
Model Y= B0 + B1*X1 +u (1) has R2 =0.72 assume we use n=25
Model Y= B0 + B1*X1 + B2*X2 +u (2) has R2 = 0.78

Should we use the model (1) or model (2) (which model is better?)

Model (1) has R2bar = 1-(1-0.72)*((25-1)/(25-2))=0.708


Model (2) has R2bar = 1-(1-0.78)*((25-1)/(25-3))=0.76

Compare two values of Rbar2 , we should use model (2)

Example: Use SPSS, to run the regression model:


Data “Consumer” in Chapter 15

Model: Amount = B0 + B1*Income +B2* Size +u


Expect: B1, B2 >0
Model Summary

Adjusted R Std. Error of the


Model R R Square Square Estimate

1 .909a .826 .818 398.091

a. Predictors: (Constant), Household


Size, Income

Amount^ = 1304.9 +33.13*Income+ 356.3*Size


When Size fixed, if Income increases 1000$/year, the average of Amount
Increases 33.13 $
The standardized coefficients can give us the information of the importance
Of independent variables on dependent variable. The more absolute value of
Beta, the more importance of this variable.
R2 =0.826: 82.6% variation of Amount that can be explained by the model.

P-value of coefficients are both very small  both independent variables are

Significant.

You might also like