Professional Documents
Culture Documents
Module5_Marketing_Mix_Model_1
Module5_Marketing_Mix_Model_1
Module5_Marketing_Mix_Model_1
Marketing-Based Analytics
Module 5– Marketing Mix Model I
Jinhee Huh
Linear Regression
Linear Regression Examples
y y
x x
Non-linear Linear
Scatterplot and Variable Relationships
1000 1000
750 750
$ for Ipad
$ for Ipad
500 500
250 250
0
0
0 1 2 3 4 5 6
0 1 2 3 4 5 6
# of Apple products owned
# of Apple products owned
750
$ for Ipad
500
250
0
0 1 2 3 4 5 6
# of Apple Products Owned
Unclear (non-monotonic)
Tools to Analyze Linear Relationships
• Simple Regression
• Cannot make
• Assume a relationship between X and Y
predictions
• But can be powerful in terms of prediction and
explanation
• Rely on a set of assumptions
Correlation
• Two Attributes
• Direction: Are X and Y positively or negatively related?
• Strength: Is the relationship weak, moderate, or strong?
Pearson Correlation – Direction
y r>0 y r<0
x x
• Question: which of the two plots possibly depicts the relationship between
• Sales and price?
• Sales and advertising?
Correlation – Strength
• Determined by the absolute value of r (-1 ≤ r ≤ 1)
• The general guideline of the effect size for r:
• But the thresholds may vary by applications—you should let your business application
determine what should be considered as a strong correlation
1000
750
$ for Ipad
500
250
0
0 1 2 3 4 5 6
# of Apple products owned
Simple Linear Regression
y a b x e
• For any given x, we can predict the y value
y a b x
Meaning of Coefficients
• Intercept, a
• The estimated average value of y when x is zero
y a b0 a
• Slope, b
• The estimated change in the average value of y as a result of a one-unit
change in x
y b b x
b y ' y
y ' b b ( x 1)
Meaning of Coefficients
Residual for this y b0 b1 x
y x* value
Observed Value of
y for x* Slope =
e
Predicted Value of one unit of x
y for x* ,
Intercept =
0 x* x
Characterize the Relationship
• Three Attributes
• Presence: whether a systematic relationship exists?
• Statistically significance of the slope coefficient
• For
• Test statistic
• For
• Test statistic
• R-squared
• Portion of the total variation in the dependent variable that is explained by the model
• Q3. Draw a scatter plot including the fitted values of each observation.
Multiple Regression
y a b1 x1 b2 x2 ... bk xk e
• The F-Test in the ANOVA table can still be used to assess the overall significance of the
model
• Shows if there is a linear relationship between y and all x variables considered together
• Hypotheses for the F test
• H0:
(none of the independent variable affects y)
• H1: At least one
(at least one independent variable affects y)
• If p < 0.05, we reject the null hypothesis. In other words, at least one predictor is
significant, but we don’t know which one is significant.
Multiple Regression: R-squared
n 1
R 2
Adjusted 1 1 R
2
n k 1
Model 1 Model 2
Adjusted
R-square 0.18 > 0.17
• Use the adjusted R-square and conclude that Model 1 has a better fit after
adjusting for the number of predictors
In-class practice
• Q3. Draw a scatter plot including the fitted values of each observation.
Statistical vs. Managerial Significance
• What statisticians care about may be different from what managers care about
• You need to put the managerial significance before the statistical significance
R program Practice
Data Description
• Module4_Amusement_Survey.csv
• head(survey)
• summary(survey)
• str(survey)
Preliminary Data Analysis
• Scatterplot
• scatterplotMatrix(survey)
• All variables need to be numeric
• Relationship between two variables
• Histogram
• Findings
• Univariate distribution: Diagonal plots
• The linear regression analysis requires all (continuous)
variables to be multivariate normal
• Distance variable is highly skewed
• Log-transform the skewed variable
• survey$log.dist = log(survey$dist)
• Note: Log-transformation is applicable only
for continuous variables
Preliminary Data Analysis
• Scatterplot
• Findings
• Bivariate relationship: Off-Diagonal plots
• Elliptical shape
• High correlation between two variables
• Satisfaction scores shows the high
correlations
• Possible multicollinearity problem
among independent variables
Preliminary Data Analysis
• Bivariate plots
• plot(overall~weekend, data=survey, xlab="Satisfaction with Rides", ylab="Overall
Satisfaction")
• plot(overall~num.child, data=survey, xlab="Satisfaction with Rides", ylab="Overall
Satisfaction")
• plot(overall~distances, data=survey, xlab="Satisfaction with Rides", ylab="Overall
Satisfaction")
Preliminary Data Analysis
• Correlation matrix
• cor(survey[ , c(2, 4:9)])
• corrplot.mixed(cor(survey[ , c(2, 4:9)]), upper="ellipse")
• Findings
• Correlation between overall (dependent variable) and independent variables
• Positively correlated
• Elliptical shape and blue color
• Correlation among independent variables
• Positively correlated
• Clean and rides, clean and games, rides and games
Preliminary Data Analysis
• Pearson correlation and significance testing
• cor.test(survey$num.child, survey$distance)
• corr.test(survey[ , c(2, 4:9)])
• Findings
• Overall satisfaction and predictor variables shows positive and significant correlations
• Some predictor variables show significantly high positive correlations
• Clean & ride
In-class practice
• Estimation result
• m1$fitted.values
• Computing R^2
• cor(survey$overall, survey$rides)^2
• Only for the simple linear regression
Prediction
• You can find the predicted overall satisfaction score from the linear regression
model object
• m1$fitted.values
• m1$fitted.values[10]
• Estimation results
• Prediction
• predict(m2, survey[1:10,])
• fitted(m2)[1:10]
• m2$fitted.values[10]
• m2$coefficients[1] +
• 100*m2$coefficients[2] +
100*m2$coefficients[3] +
100*m2$coefficients[4] +
100*m2$coefficients[5]
• coef(m2)["(Intercept)"] + coef(m2)["rides"]*100 + coef(m2)
["games"]*100 + coef(m2)["wait"]*100 + coef(m2)
["clean"]*100