Week 9 Lesson 2 PDF

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Week 9 Lesson 2 Multi-Variate Statistical Technique -Multiple Regression

Analysis;
Analysis of Dependence
Multivariate dependence techniques are variants of the general linear model
(GLM). Simply, the GLM is a way of modeling some process based on how
different variables cause fluctuations from the average dependent variable.
Fluctuations can come in the form of group means that differ from the overall
mean as in ANOVA or in the form of a significant slope coefficient as in
regression. The basic idea can be thought of as follows:
Yi = μ + ΔX + ΔF + ΔXF
Here, μ represents a constant, which can be thought of as the overall mean of
the dependent variable, ΔX and ΔF represent changes due to main effect
independent variables (such as experimental variables) and blocking
independent variables (such as covariates or grouping variables), respectively,
and ΔXF represents the change due to the combination (interaction effect) of
those variables. Realize that Yi in this case could represent multiple dependent
variables, just as X and F could represent multiple independent variables.
Multiple regression analysis, n-way ANOVA, and MANOVA represent common
forms that the GLM can take.
Multiple Regression Analysis
Multiple regression analysis is an extension of simple regression analysis
allowing a metric dependent variable to be predicted by multiple independent
variables. Simple linear regression analysis with an example explaining a
construction dealer’s sales volume with the number of building permits issued.
Thus, one dependent variable (sales volume) is explained by one independent
variable (number of building permits). Yet reality is more complicated and
several additional factors probably affect construction equipment sales. Other
plausible independent, variables include price, seasonality, interest rates,
advertising intensity, consumer income, and other economic factors in the area.
The simple regression equation can be expanded to represent multiple
regression analysis:
Yi = b0 + b1X1 + b2X2 + b3X3 + . . . + bnXn + ei
Thus, as a form of the GLM, dependent variable predictions (Yˆ ) are made by
adjusting the constant (bo), which would be equal to the mean if all slope
coefficients are 0, based on the slope coefficients associated with each
independent variable (b1, b2, . . . , bn). Less-than interval (nonmetric)
independent variables can be used in multiple regression. This can be done by
implementing dummy variable coding. A dummy variable is a variable that uses
a 0 and a 1 to code the different levels of dichotomous variable (for instance,
residential or commercial building permit). Multiple dummy variables can be
included in a regression model. For example, dummy coding is appropriate when
data from two countries are being compared. Suppose the average labor rate
for automobile production is included in a sample taken from respondents in the
United States and in South Korea. A response from the United States could be
assigned a 0 and responses from South Korea could be assigned a 1 to create a
country variable appropriate for use with multiple regression.
A SIMPLE EXAMPLE
Assume that a toy manufacturer wishes to explain store sales (dependent
variable) using a sample of stores from Canada and Europe. Several hypotheses
are offered:
• H1: Competitor’s sales are related negatively to our firm’s sales.
• H2: Sales are higher in communities that have a sales office than when no sales
office is present.
• H3: Grammar school enrollment in a community is related positively to sales.
Competitor’s sales is how much the primary competitor sold in the same stores
over the same time period. Both the dependent variable and the competitor’
s sales are ratio variables measured in euros (Canadian sales were converted to
euros). The presence of a sales office is a categorical variable that can be
represented with dummy coding (0 = no office in this particular community, 1 =
office in this community). Grammar school enrollment is also a ratio variable
simply represented by the number of students enrolled in elementary schools
in each community (in thousands).11 A sample of 24 communities is gathered
and the data are entered into a regression program to produce the following
results: Regression equation: Yˆ = 102.18 + 0.387X1 + 115.2X2 + 6.73X3
Coefficient of multiple determination (R2) = 0.845
F-value = 14.6; p < 0.05
Note that all the signs in the equation are positive. Thus, the regression equation
indicates that sales are positively related to X1, X2, and X3. The coefficients show
the effect on the dependent variable of a 1-unit increase in any of the
independent variables. The value or weight, b1, associated with X1 is 0.387.
Thus, a one-unit increase ($1,000) in competitors’ sales volume (X1) in the
community is actually associated with an increase of $387 in the toy
manufacturer’s sales (0.387 _ $1,000 = $387). The value of b2 = 115.2, which
indicates that an increase of $115,200 (115.2 thousand) in toy sales is expected
with each additional unit of X2. Thus, it appears that having a company sales
office in a community is associated with a very positive effect on sales. Grammar
school enrollments also may help predict sales. An increase of 1 unit of
enrolment (1,000 students) indicates a sales increase of $6,730. Because the
effect associated with X1 is positive, H1 is not supported; as competitor sales
increase, our sales increase as well. The effects associated with H2 and H3 are
also positive, which is in the hypothesized direction. Thus, if the coefficients are
statistically significant, H2 and H3 will be supported.
■ REGRESSION COEFFICIENTS IN MULTIPLE REGRESSION
Recall that in simple regression, the coefficient b1 represents the slope of X on
Y. Multiple regression involves multiple slope estimates, or regression weights.
One challenge in regression models is to understand how one independent
variable affects the dependent variable, considering the effect of other
independent variables. When the independent variables are related to each
other, the regression weight associated with one independent variable is
affected by the regression weight of another. Regression coefficients are
unaffected by each other only when independent variables are totally
independent.
Conventional regression programs can provide standardized parameter
estimates, β1, β2, and so on, that can be thought of as partial regression
coefficients. The correlation between Y and X1, controlling for the correlation
that X2 has with the Y, is called partial correlation. Consider a standardized
regression model with only two independent variables:
Y = β1X1 + β2X2 + ei
The coefficients β1 and β2 are partial regression coefficients, which express
the relationship between the independent variable and dependent variable
taking into consideration that the other variable
also is related to the dependent variable. As long as the correlation between
independent variables is modest, partial regression coefficients adequately
represent the relationships. When the correlation between two independent
variables becomes high, the regression coefficients may not be, reliable, as
illustrated in the Research Snapshot on the next page. When researchers want
to know which independent variable is most predictive of the dependent
variable, the standardized regression coefficient (β) is used. One huge
advantage of β is that it provides a constant scale. In other words, the βs are
directly comparable. Therefore, the greater the absolute value of the
standardized regression coefficient, the more that particular independent
variable is responsible for explaining the dependent variable. For example,
suppose in the toy example above, the following standardized regression
coefficients were found:
β1 = 0.10
β2 = 0.30
β3 = 0.10
The resulting standardized regression equation would be
Y = 0.10X1 + 0.30X2 + 0.10X3 + ei
Using standardized coefficients, the researcher concludes that the relationship
between competitor’s sales (X1) and company sales (Y ) is the same strength
as is the relationship between grammar, school enrollment (X3) and company
sales. Perhaps more important, though, the conclusion can also be reached that
the relationship between having a sales office in the area (X2) and sales is three
times as strong as the other two relationships. Thus, management may wish to
place more emphasis on locating sales offices in major markets.

You might also like