Professional Documents
Culture Documents
Session 5 - Lecture Slides
Session 5 - Lecture Slides
Regression Analysis
Session 5
ECON314
Introduction to Regression Inference
Approach:
Draw a specific random sample from the population and observe sample
statistics for that sample.
Use those observed sample statistics as the best guess of the unobserved
population parameters.
Ask whether it is likely to observe the sample statistics actually observed
if the unobserved population parameter equals a hypothesized value.
Desirable Properties of Simple Linear Regression Estimators
Two desirable properties of estimators are:
(1) Unbiased when the average value of all possible
estimates equals the true population value or
E ( 1 ) 1
E ( )
If 1 1 then the estimator is biased
If both
estimators are
unbiased then
an estimator is
more efficient
than another
estimator if it
has a lower
variance.
A more efficient estimator is preferred to a less efficient
estimator because it does a better job estimating the true
population parameter (extreme values are less likely).
Properties of the OLS estimator
2. Confidence intervals
3. Test of significance
b. p-value method
c. F- distribution method
STANDARD ERRORS
STANDARD ERRORS
2 se(b1) var( b1 )
n ( X X )
1 2
i
2
var( b2 ) se(b2 ) var( b2 )
(X i X )2
1. Confidence Interval
2. Critical Value
3. P-value
H0: 0
H1: > 0
Pairs of hypotheses
3. or
H0 : 0
H1 : > 0 (a positive relationship, eg. normal good)
Interval Estimation
Because of variation in sample statistics, it is more desirable to specify
the interval into which the coefficient is likely to fall.
Estimating a population parameter with a confidence interval is often
preferable to using a point estimate because it provides additional
information about the variability of the estimate
CONFIDENCE INTERVALS
Lower Upper
confidence confidence
limit limit
Point Estimate
Width of
confidence interval
Or bk tc se(bk )
Where:
o bk is the estimated coefficient/point estimate.
o se(bk) is its estimated standard error (also written as ).
o tc is a critical value from the t-distribution, using the area for two-tails and
df = n k.
Example 1: Constructing a
Confidence Interval
Construct a 95% confidence interval for the effect of the price of
chicken on quantity consumed:
cc = 73.08 1.95pc + 1.15pb + 0.1pcy n = 40
(11.12) (0.538) (0.288) (0.040)
3.037 0.863
Using a confidence interval to test a
hypothesis
Confidence intervals can thus be used to test hypotheses about the
value of individual coefficients:
Example
Suppose we postulate that:
H0: pc = 0
H1: pc 0 CI 3.037 pc 0.863
We must reject the hypothesis that the true value of the parameter is zero since
zero lies outside of the 95% CI. Reject H0 if k lies
in this region
If the hypothesised value of k
lies within the CIs, then the If the hypothesised value
Reject H0 if k lies hypothesised value is plausible, of k lies outside the CI,
in this region
and we cannot reject it (at the then we must reject the
chosen confidence level). hypothesis that this is the
true value of the
Do not reject H0 if k lies in this region parameter.
3.037 0.863 0
CRITICAL VALUES
TESTS OF SIGNIFICANCE
t-distribution
Tests of significance can also be conducted using various probability
distributions. We will use the t-distribution as an example.
It involves determining a range of values of t for which one must
reject the null hypothesis. This is known as the rejection region.
If the test statistic falls into the rejection region, then reject H0
(and vice versa).
t-distribution
negative
3. A right-tailed test : This occurs when the alternative hypothesis is
of the form
H1: > *
positive
TESTING HYPOTHESES ABOUT INDIVIDUAL
COEFFICIENTS
1. Start by writing H0 and H1
2. Then use the chosen method to decide which of the two
hypotheses to support.
1. The critical value method
a. Decide on the significance level
b. Look up a critical value, tC, (or use the rule-of-thumb).
c. Calculate the test statistic (or use the one given by Stata)
d. Compare the test statistic to the critical value and make a decision:
cc = 73.08 1.95pc n = 40
(11.12) (0.538)
1. Our hypothesis is that there is no relationship between the price
of chicken and the consumption of chicken
H0: pc = 0 H1: pc 0 (two-tailed test )
.()
=
= = -3.62
.
99%
0.5% 0.5%
-2.704 2.70
4
t is more extreme than tc (lies in the rejection region on the left hand
side) so reject the null hypothesis.
Price does affect consumption, at 1% significance level.
Example b. cc = 73.08 + 1.15pb n = 40
(11.12) (0.288)
H0: pd 1
H1: pd > 1 (right-tailed test )
= 0.52 < = 2.423 99% Rejection
region
2.423 tc
Calculated t does not fall into the rejection region, therefore do not
reject the null hypothesis.
A one unit increase in the price of beef does not increase the
consumption of chicken by more than one unit.
TESTS OF SIGNIFICANCE
t-statistic
The hypothesis is
rejected at the 5%
level
P-VALUES
TESTING HYPOTHESES ABOUT INDIVIDUAL
COEFFICIENTS
2. The p-value method
Selection of the level of significance in hypothesis tests is arbitrary:
the choice of level is driven by convention.
1%, 5% and 10% are the levels most commonly used.
An alternative is to find the p-value (probability value):
this is the lowest probability (or level of significance) at which we can reject
H0
The smaller the p-value, the stronger is the evidence against the null
hypothesis.
We reject H0 at all levels of significance above the p-value but fail to
reject it at all levels below p:
Do not
Reject H0
reject H0
0 p Probability values 1
Example: The p-value on the I coefficient is 0.020.
This means that the lowest probability level at which you can reject the null
hypothesis (that there is no relationship between consumers incomes and
the consumption of chicken) is 2%.
Thus we reject H0 at all levels above 2% (such as 5%) but fail to reject it
at all levels below 2% (such as 1%).