Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 38

Session 11

Chapter 16-17
Predictive Analysis
Types of analysis used in Marketing Research

Predictive analysis: allows one to make


forecasts for future events

- Bivariate (i.e. Linear) regression

- Multiple regression
Regression Analysis
• Is a powerful and flexible procedure for analysing ass
ociative relationships between a metric-dependent (i.
e. continuous) variable (Y) and one or more independ
ent variables (X)

– Independent variable (X): used to predict the dependent variable


• e.g. perceptions of price, brand image, service attributes…
– Dependent variable (Y): that which is predicted
• e.g. total sales revenue, overall satisfaction, intention to repurchase etc.
Regression Analysis
Linear Regression
Regression Analysis
Bivariate (or linear) Regression
Which straight line fits best?
Line B
Dependent
Line A
variable (Y)
Line C

Packets
bought

Independent
variable (X)
Advert. watch
Bivariate (or linear) Regression Analysis
How should the line be fitted to best describe the data? -
One rule is to minimize the total error

Y
e5
The vertical distance from the point
Actual to the line is the error,
value Residual error ei
= Actual value – Predicted value
Residual error e4

Predicted The equation is derived in the


value
form of a straight line by using
e2 the least-square procedure (i.e.
e3
e1 has the smallest sum of squared
vertical distances between the
X points and the line)
i
e 2
Same Intercept, Different Slopes
Same Slopes, Different Intercepts
Regression Analysis
• Determine whether the independent variables (X) explain a significant variation in the dependen
t variable (Y) (i.e. whether a relationship exists: F-test, t-test)

• Determine how much of the variation in the dependent variable can be explained by the indepe
ndent variables (i.e. strength of the relationship: R square)

• Determine the structure of relationship (i.e. mathematical equation)


– Only two variables for bivariate regression
– Three or more variables for multiple regression

• Predict the values of the dependent variables (Y)

• Control for other independent variables when evaluating the contributions of a specific variable
or set of variables

• Widely used for explaining market share, sales, brand preference, intention to purchase, overal
l experience/satisfaction etc.

• However, it cannot determine causality (i.e., cause and effect relationship between X and Y!)
Bivariate (i.e. Linear) Regression Multiple Regression

Y
Y

X2
X

X1
General rule is to plot scatter diagram [DV on the vertical axis(Y) and IV on
the horizontal axis(X)] for determining the form of the relationship between
the variables (i.e. whether the relationship between X and Y is linear)
Understanding Predication in Regression Model

• Two approaches to prediction:


– Extrapolation: use past experience as a means to predict the
future
• Identify the pattern over time and forecasts the pattern into the futu
re
– E.g. based on past two exams to predict the difficulty of coming exam

– Predictive model: uses an observed relationships among vari


ables to make a prediction
• E.g. researcher believes that many factors (temperature, wind directi
ons, humidity etc.) predict weather
• We use this method in most marketing research!
You think the rating
of on-site facilities
explains overall
satisfaction…
Code book
SPSS steps: Regression

• Step 1: open festival.sav data file


• Step 2:

 Step 3: Put “sat_overall” to


Dependent variable; “sat_faciltiies”
to Independent variable. Select
“Enter” Method.
SPSS steps: Regression cont.
 Step 4: select “Statistics’ button. Click  Step 5: select “Options’ button.
“Estimates” and “Model fit”. Click “Include constant in
equation”
R-square of 0.653 indicates that the satisfaction
of on-site facilities (x) accounts for 65.3% of
variation in overall satisfaction (Y).
The model’s goodness of fit is good.

Test for individual variable


significance
The satisfaction of on-site facilities is
a significant predicator because its
p-value =0.000 < 0.05

The satisfaction of on-site


facilitates is positively related to
the overall satisfaction
(because there is a positive sign
of coefficient)
=> For every unit increase in the
satisfaction level of on-site
facilities, the overall satisfaction
increases by 0.618.
SPSS output – Bivariate (or Linear Regression)

The regression equation is:


Overall satisfaction =1.476 + 0.618 (on-site facilities)
1. Coefficient of Determination R2
• R2 is the strength of association between Y and X. It’s the square of simple
correlation, r2 , when correlating two variables.

• R2 presents the percent of variation of Y explained by the total changes in


the dependent variables.

• R2 ranges from 0 to 1.
– The higher the R2, the better the data fit the model (i.e., the goodness of fit is high)

• In commercial marketing research, value of R2 ≥ 0.9 is uncommon. Values o


f 0.5 ≤ R2 ≤ 0.8 may be more reasonable.
• However, for some academic research, values of R2 in 0.3 or above is still ac
ceptable due to the complexity of marketing phenomenon.
2. Test for Significance: Individual Variables

• Coefficient table shows whether there is a linear relationship be


tween the variable Xi and Y

• Use t Test Statistic


– Hypotheses:
• H0: i = 0 (No linear relationship)
• H1: i  0 (Linear relationship between Xi and Y)
– Decision:
• If p value is less than  , reject H0
– Conclusion:
•  i is a significant predictor of Y
Predictive Analysis
• What are the predicated overall satisfaction (Yi) when the sati
sfaction score for on-site facilities is 3?
The predicated overall satisfaction
=1.476 + 0.618 (on-site facilities)
= 1.476+0.618(3)
= 3.33
• How about if the rating of on-site facilities is 4?
Types of analysis used in Marketing Research

Predictive analysis: allows one to make


forecasts for future events

- Bivariate (i.e. Linear) regression

- Multiple regression
Multiple Regression Analysis
• Multiple regression analysis uses the same concepts as bi
variate regression analysis, but uses more than one indep
endent variable (all are metric variables)

• Examples:
– Are consumers’ perceptions of quality determined by their perceptions of pric
es, brand image, and brand attributes?
– Are purchase intention determined by their perception on price, brand image
and staff service?
What is the effect of the rating of “all
attributes” to overall satisfaction?
Multiple Regression Analysis
(Enter Method)

Input all
satisfaction
factors to
the Box
Using ENTER method

The negative
relationship are not
logical in this case

Not significant
We cannot use this model!
Selection of Regression Models
• When multiple variables are involved, different combinations of vari
ables result in different models

• To select the model, many methods are available. One method (Bac
kward elimination) specifies that
1. First, include all relevant variables into the regression model
2. Then, exclude the variables with non-significant coefficients – e
xclude only one variable at a time. Start from the variable with t
he largest p-value (e.g., start from “welcome”, then “atmospher
e”)
3. Then, exclude the variables with the direction of coefficient is n
ot logical – exclude only one variable at a time. (e.g., exclude “t
ourist information”)
Final Model after Backward elimination
Adjusted R square is
0.795. Thus, the
satisfaction of
accessibility and on-site
facilities account for
70.5% of the variance in
overall satisfaction

The regression equation is:


Overall satisfaction = 0.947 + 0.209 (accessibility) + 0.577 (on-site
facilities)
Standardized beta coefficients (b1…bk)
• All variables (Y, X1, X2,…Xk) have been standardized to a mean of 0 and a va
riance of 1 before estimating regression equation.
• It indicates the relative importance of alternative predictors

 According to standardized coefficients (beta), on-site facilities (0.754) is more important


than accessibility (0.243) to explain the variance in overall satisfaction
 Why?
 When the satisfaction level of “on-site facilities” increases by 1 s.d. (standard
deviation), “overall satisfaction” increases by 0.754 s.d.
 When the satisfaction level of “accessibility” increases by 1 s.d. (standard deviation),
“overall satisfaction” only increases by 0.243 s.d.
Strength of Association for Multiple Regression

• The R2 will be affected by the number of independent variable


s and sample size.

• Therefore, the adjusted R2 is used for multiple regression .

• For example,
– The adjusted R square is 0.705
• It corresponds to the model’s goodness of fit after adjusting
for the number of independent variables and sample size
Selection of Regression Models

• The ideal model for multiple regression should:


– Have a higher adjusted R2
– Have the predictors with a logical direction
– Have significant predictors (i.e., p value < 0.05)

• However, the interpretation will vary depending on the researcher.


– Sometimes, the model is selected even if some predictors are not signific
ant (in this case, we might assume that the insignificant predictors are not import
ant to affect the outcome variable).
– However, the predictors still need to be in a logical direction
For example
A multiple regression model is employed to understand if any
factors can drive respondents to participate the promotional game
in Prada’s catalog.

Respondents’ opinions towards the following statements are measured by 7-


point scale with “1” denotes “strongly disagree” while “7” denotes “strongly
agree”.
• Save money: Games save me money on my shopping
• High Status: Prada is high status
• Boring: The game was boring to play

Another variable is measured by 7-point scale with “1” denotes “Not so likely”
while “7” denotes “Very likely”:
• Participate: How likely are you to participate in this promotional game?
A multiple regression model is employed to understand if any factors can
drive respondents to participate the promotional game in Prada’s catalog.
Model Summary
Adjusted R Std. Error of the
Model R R Square Square Estimate
a
1 .769 .591 .578 .93406
a. Predictors: (Constant), Save Money, High Status, Boring

Coefficientsa
Standardized
Unstandardized Coefficients Coefficients
It’s logic
Model B Std. Error Beta t Sig.
that
1 (Constant) 3.382 .407 8.311 .000
“Boring” is
negatively High Status .102 .050 .143 2.012 .654
related to Boring -.133 .061 -.155 -2.172 .031
“participate Save Money .204 .047 .296 4.295 .000
” a. Dependent Variable: Participate

The regression equation is:


Participate= 3.382+0.102(high status)-0.133(boring)+0.204(save money)

Both “boring” (p-value=0.031) and “save money” (p-value=0.000) are the significant independent
variables

According to standardized coefficients (beta), “save money” (Beta = .296) is the most important factor to explain the
“participate”, followed by “boring” and then “high status”

The adjusted R Square is 0.578,which indicates that the goodness of model fit is good.
Any managerial implications drawn based on the results?
Last but not least…
• When multiple variables are involved, different combina
tions of variables result in different models.

• The process is simplified in this lecture…regression analys


is is complex and requires additional study to check the a
ppropriateness of our estimated regression model
– e.g. examination of residuals

• Regression is a statistical tool, not a cause-and-effect stat


ement. Need more exploratory research to understand t
heir causality relationship.
• The end!

You might also like