Professional Documents
Culture Documents
Session 11: Chapter 16-17 Predictive Analysis
Session 11: Chapter 16-17 Predictive Analysis
Chapter 16-17
Predictive Analysis
Types of analysis used in Marketing Research
- Multiple regression
Regression Analysis
• Is a powerful and flexible procedure for analysing ass
ociative relationships between a metric-dependent (i.
e. continuous) variable (Y) and one or more independ
ent variables (X)
Packets
bought
Independent
variable (X)
Advert. watch
Bivariate (or linear) Regression Analysis
How should the line be fitted to best describe the data? -
One rule is to minimize the total error
Y
e5
The vertical distance from the point
Actual to the line is the error,
value Residual error ei
= Actual value – Predicted value
Residual error e4
• Determine how much of the variation in the dependent variable can be explained by the indepe
ndent variables (i.e. strength of the relationship: R square)
• Control for other independent variables when evaluating the contributions of a specific variable
or set of variables
• Widely used for explaining market share, sales, brand preference, intention to purchase, overal
l experience/satisfaction etc.
• However, it cannot determine causality (i.e., cause and effect relationship between X and Y!)
Bivariate (i.e. Linear) Regression Multiple Regression
Y
Y
X2
X
X1
General rule is to plot scatter diagram [DV on the vertical axis(Y) and IV on
the horizontal axis(X)] for determining the form of the relationship between
the variables (i.e. whether the relationship between X and Y is linear)
Understanding Predication in Regression Model
• R2 ranges from 0 to 1.
– The higher the R2, the better the data fit the model (i.e., the goodness of fit is high)
- Multiple regression
Multiple Regression Analysis
• Multiple regression analysis uses the same concepts as bi
variate regression analysis, but uses more than one indep
endent variable (all are metric variables)
• Examples:
– Are consumers’ perceptions of quality determined by their perceptions of pric
es, brand image, and brand attributes?
– Are purchase intention determined by their perception on price, brand image
and staff service?
What is the effect of the rating of “all
attributes” to overall satisfaction?
Multiple Regression Analysis
(Enter Method)
Input all
satisfaction
factors to
the Box
Using ENTER method
The negative
relationship are not
logical in this case
Not significant
We cannot use this model!
Selection of Regression Models
• When multiple variables are involved, different combinations of vari
ables result in different models
• To select the model, many methods are available. One method (Bac
kward elimination) specifies that
1. First, include all relevant variables into the regression model
2. Then, exclude the variables with non-significant coefficients – e
xclude only one variable at a time. Start from the variable with t
he largest p-value (e.g., start from “welcome”, then “atmospher
e”)
3. Then, exclude the variables with the direction of coefficient is n
ot logical – exclude only one variable at a time. (e.g., exclude “t
ourist information”)
Final Model after Backward elimination
Adjusted R square is
0.795. Thus, the
satisfaction of
accessibility and on-site
facilities account for
70.5% of the variance in
overall satisfaction
• For example,
– The adjusted R square is 0.705
• It corresponds to the model’s goodness of fit after adjusting
for the number of independent variables and sample size
Selection of Regression Models
Another variable is measured by 7-point scale with “1” denotes “Not so likely”
while “7” denotes “Very likely”:
• Participate: How likely are you to participate in this promotional game?
A multiple regression model is employed to understand if any factors can
drive respondents to participate the promotional game in Prada’s catalog.
Model Summary
Adjusted R Std. Error of the
Model R R Square Square Estimate
a
1 .769 .591 .578 .93406
a. Predictors: (Constant), Save Money, High Status, Boring
Coefficientsa
Standardized
Unstandardized Coefficients Coefficients
It’s logic
Model B Std. Error Beta t Sig.
that
1 (Constant) 3.382 .407 8.311 .000
“Boring” is
negatively High Status .102 .050 .143 2.012 .654
related to Boring -.133 .061 -.155 -2.172 .031
“participate Save Money .204 .047 .296 4.295 .000
” a. Dependent Variable: Participate
Both “boring” (p-value=0.031) and “save money” (p-value=0.000) are the significant independent
variables
According to standardized coefficients (beta), “save money” (Beta = .296) is the most important factor to explain the
“participate”, followed by “boring” and then “high status”
The adjusted R Square is 0.578,which indicates that the goodness of model fit is good.
Any managerial implications drawn based on the results?
Last but not least…
• When multiple variables are involved, different combina
tions of variables result in different models.