Professional Documents
Culture Documents
Multiple Regression: Prem Mann, Introductory Statistics, 7/E
Multiple Regression: Prem Mann, Introductory Statistics, 7/E
MULTIPLE REGRESSION
Opening Example
MULTIPLE REGRESSION
ANALYSIS
Definition
A regression model that includes two or
more independent variables is called a
multiple regression model. It is written as
y = A + B1x1 + B2x2 + B3x3+ + Bkxk +
where y is the dependent variable, x1, x2,
x3, , xk are the k independent variables,
and is the random error term.
Prem Mann, Introductory Statistics, 7/E
Copyright 2010 John Wiley & Sons. All right reserved
MULTIPLE REGRESSION
ANALYSIS
When each of the xi variables represents a
single variable raised to the first power as
in the above model, this model is referred
to as a first-order multiple regression
model. For such a model with a sample
size of n and k independent variables, the
degrees of freedom are: df = n - k - 1
COEFFICIENT OF MULTIPLE
DETERMINATION
The coefficient of determination for the
multiple regression model, usually called
the coefficient of multiple
determination, is denoted by R2 and is
defined as the proportion of the total sum
of squares SST that is explained by the
multiple regression model. It tells us how
good the multiple regression model is and
how well the independent variables
included in the model explain the
dependent variable.
Prem Mann, Introductory Statistics, 7/E
Copyright 2010 John Wiley & Sons. All right reserved
COEFFICIENT OF MULTIPLE
DETERMINATION
0R 1
2
SSE e y y
SST SSyy y y
SSR y y
COEFFICIENT OF MULTIPLE
DETERMINATION
SSR is the portion of SST that is explained
by the use of the regression model, and
SSE is the portion of SST that is not
explained by the use of the regression
model. The coefficient of multiple
determination is given by the ratio of SSR
and SST as follows.
SSR
R
SST
2
Characteristics of R2
The value of R2 generally increases as we
add more and more explanatory variables
to the regression model (even if they do
not belong in the model).
Increasing the value of R2 does not imply
that the regression equation with a higher
value of R2 does a better job of predicting
the dependent variable.
It will not represent the true explanatory
power of the regression model.
Characteristics of R2
Instead, we use the adjusted coefficient of
multiple determination R2.
The value of R2 may increase, decrease, or
stay the same as we add more
explanatory variables to our regression
model.
If a new variable added to the regression
model contributes significantly to explain
the variation in y, then R2 increases;
otherwise it decreases. The value of R2 is
calculated as follows.
Characteristics of R2
n 1
R 1 1 R
n k 1
2
SSR / n k 1
or 1
SST /(n 1)
COMPUTER SOLUTION OF
MULTIPLE REGRESSION
In this section, we take an example of a
multiple regression model, solve it using
MINITAB, interpret the solution, and make
inferences about the population parameters
of the regression model.
Example 14-1
A researcher wanted to find the effect of
driving experience and the number of
driving violations on auto insurance
premiums. A random sample of 12 drivers
insured with the same company and
having similar auto insurance policies was
selected from a large city.
Example 14-1
Table 14.1 lists the monthly auto insurance
premiums (in dollars) paid by these
drivers, their driving experiences (in
years), and the numbers of driving
violations committed by them during the
past three years. Using MINITAB, find the
regression equation of monthly premiums
paid by drivers on the driving experiences
and the numbers of driving violations.
Table 14.1
y = A + B1x1 + B2x2 +
Enter the given data of Table 14.1 in
columns C1, C3, and C3 into MINITAB.
Name them Monthly Premium, Driving
Experience and Driving Violations, as
shown in Screen 14.1.
Prem Mann, Introductory Statistics, 7/E
Copyright 2010 John Wiley & Sons. All right reserved
Example 14-2
Refer to Example 14-1 and the MINITAB
solution given in Screen 14-3.
(a) Explain the meaning of the estimated
regression coefficients.
(b) What are the values of the standard
deviation of errors, the coefficient of
multiple determination, and the adjusted
coefficient of multiple determination?
Example 14-2
(c) What is the predicted auto insurance
premium paid per month by a driver with
seven years of driving experience and
three driving violations committed in the
past three years?
(d) What is the point estimate of the
expected (or mean) auto insurance
premium paid per month by all drivers with
12 years of driving experience and 4
driving violations committed in the past
three years?
Prem Mann, Introductory Statistics, 7/E
Copyright 2010 John Wiley & Sons. All right reserved
bi t sbi
Example 14-3
Determine a 95% confidence interval for
B1 (the coefficient of experience) for the
multiple regression of auto insurance
premium on driving experience and the
number of driving violations. Use the
MINITAB solution of Screen 14.3.
bi Bi
t
sbi
The value of Bi is substituted from the
null hypothesis. Usually, but not always,
the null hypothesis is H0: Bi = 0. The
MINITAB solution contains this value of
the t statistic.
Prem Mann, Introductory Statistics, 7/E
Copyright 2010 John Wiley & Sons. All right reserved
Example 14-4
Using the 2.5% significance level, can
you conclude that the coefficient of the
number of years of driving experience in
regression model (3) is negative? Use
the MINITAB output obtained in Example
141 and shown in Screen 14.3 to
perform this test.
H 1 : B1 < 0
Step 2:
The sample size is small (n < 30)
is not known
Figure 14.1
Step 4:
The value of the test statistic t for b1 can be
obtained from the MINITAB solution given in
Screen 14.3. Thus, the observed value of t is
bi Bi
t
2.81
sbi