Professional Documents
Culture Documents
Basic Estimation Techniques: Essential Concepts
Basic Estimation Techniques: Essential Concepts
Essential Concepts
1. A simple linear regression model relates a dependent variable Y to a single independent (or
explanatory) variable X in a linear fashion
The intercept parameter (a) gives the value of Y at the point where the regression line crosses the Y-
axis, which is the value of Y when X is zero. The slope parameter (b) gives the change in Y associated
residuals. The residual is the difference between the actual value of Y and the fitted value of Y, .
This method of estimating a and b is called the method of least-squares, and the estimated regression
line, is called the sample regression line. The sample regression line is an estimate of the
true regression line.
4. The estimates and do not, in general, equal the true values of a and b. Since and are
computed using data from a random sample, the estimates themselves are random variables—the
estimates would vary in value from one random sample to another random sample. Statisticians have
shown that the distribution of values that the estimates might take is centered around the true value of
the parameter. An estimator is unbiased if the average value, or the expected value, of the estimator is
equal to the true value of the parameter. The method of least-squares can produce unbiased estimates
of a and b.
5. It is the randomness of the parameter estimates that necessitates testing for statistical significance.
Just because the estimate is not zero does not mean the true value of b is not zero. Even when b does
equal zero, it is still possible that the sample will produce a least-squares estimate that is different
from zero. Thus, it is necessary to determine if there is sufficient statistical evidence in the sample to
indicate that Y is truly related to X (i.e., ).
6. There are two ways to determine whether an estimated parameter is statistically significant. Either a t-
test can be performed or the p-value for the parameter estimate can be examined.
7. To perform a t-test for significance, a researcher must first determine the level of significance for the
test. The significance level of a test is the probability of finding a parameter estimate to be
significantly different from zero when, in fact, b is zero. This mistake is called a Type I error. Lower
levels of significance, other things equal, are more desirable. One minus the level of significance is
called the level of confidence.
Once the level of significance is chosen, the t-ratio is computed as
estimated. If the absolute value of the t-ratio is greater (less) than the critical t-value, then is (is not)
statistically significant.
8. An alternative method of assessing the statistical significance of parameter estimates is to treat as
statistically significant only those parameter estimates whose p-values are smaller than the maximum
acceptable significance level. The p-value gives the exact level of significance for a parameter
estimate, which is the probability of finding significance when none exists.
9. The coefficient of determination R2 measures the percentage of the total variation in the dependent
variable that is explained by the regression equation. The value of R2 ranges from 0 to 1. A high R2
indicates Y and X are highly correlated and the scatter diagram tightly fits the sample regression line.
10. The F-test is used to test for significance of the overall regression equation. The F-statistic from the
computer printout is compared to the critical F-value obtained from the F-table at the end of your
textbook. The critical F-value is identified by two separate degrees of freedom and the significance
level. The first of the degrees of freedom is and the second is . If the value for the
calculated F-statistic (calculated by the computer) exceeds the critical F-value, the regression
equation overall is statistically significant at the specified significance level. Alternatively, if the p-
value for the F-statistic is smaller than the acceptable level of significance, the equation as a whole is
statistically significant.
11. Multiple regression uses more than one explanatory variable to explain the variation in the dependent
variable. The coefficient for each of the explanatory variables measures the change in Y associated
The coefficients b and c are elasticities. For example, b measures the percent change in Y that results
when X changes by 1 percent.
1. a. The intercept a is expected to be positive because even if no advertising is undertaken, some sales
are expected to occur. b is expected to have a positive sign since Vanguard's sales are positively
related to its level of advertising expenditures. Vanguard's sales should be inversely related to its
rivals' expenditures on advertising, so c is expected to be negative.
b. a is the sales of Bright Side detergent when neither Vanguard nor its rivals advertise. b is ΔS/ΔA,
the increase in Bright Side sales attributable to a $1,000 per week increase in advertising
expenditures by Vanguard. c is ΔS/ΔR, the decrease in Bright Side sales attributable to a $1,000
per week increase in advertising expenditures by Vanguard's rivals.
c. The exact level of significance of is 0.0128. There is only a 1.28% chance that b = 0, which is
better than the 10 percent level required by the marketing director.
d. The exact level of significance of is 0.0927. There is a 9.27% chance that rivals’ spending on
advertising does not affect Vanguard’s sales (i.e., b = 0), which is just barely better than the 10
percent level required by the marketing director.
e. About 78 percent of the variation in sales remains unexplained. Find additional explanatory
variables that have a significant affect on S. The manager might try adding the price of its
detergent and the prices of its rivals’ detergents.
2. a. At the 95% level of confidence, the critical F-value is Fk–1,n–k = F1,15 = 4.54. Since the computed
F-ratio 42.674 is greater than 4.54, the regression equation provides evidence of a statistically
significant relation. Note also that 74% of the variation in V is explained by the equation.
b. The critical t-value for n – k = 15 degrees of freedom and a 95% level of confidence is 2.131. For
ˆ: t = 25.418 > 2.131; statistically significant. If Proposition 103 has no impact on auto insurance
premiums in any given county, P = 0, and the expected percentage of voters favoring Proposition
103 in that particular county is given by:
V = 53.682 – 0.528(0) = 53.682,
or 53.7% are expected to favor Proposition 103.
b. Since ΔE/ΔN is estimated to be 32.31, each extra ticket sold in December is expected to increase
annual earnings by $32.31.
b. b = %ΔQ/%ΔH
c = %ΔQ/%ΔS
A 20% increase in S will increase Q by 5.1% (= 0.2550 x 20).
c. Perform an F-test. The 5 percent F-value is Fk–1,n–k = F2,50 = 3.18. Since 29.97 > 3.18, the overall
equation is statistically significant. The p-value for the F indicates significance below the 0.01%
level.
d. 54.52% of the variation in Q is explained by this model. The R2 could be increased by adding
some additional explanatory variables such as the sales experience of the salespersons employed.
Whether the sales day is a weekday or a Saturday/Sunday, and the level of advertising in
newspapers the previous week.
f. For : t = 3.44 > 2; statistically significant. The p-value indicates exact significance at the
0.12% level.
Since b = %ΔQ / %ΔH, is the estimated percentage increase in sales attributable to increase the
hours of operation by 1%, all else constant. Since = 0.3517, a 10 percent decrease in H will
decrease sales by 3.52 percent (= 0.3517 10).