Module5_Marketing_Mix_Model_1

MKTG 631
Marketing-Based Analytics
Module 5– Marketing Mix Model I
Jinhee Huh
Linear Regression
Linear Regression Examples
• Is the willingness to pay for iPhone related to number of apple products

already owned?
• How does product price affect sales?
• If a person’s stated importance of going to movie theatre is higher, will she

go to the movie theatre more often?
Scatterplot and Variable Relationships
• Which scatterplot describes a linear relationship?
y y
x x
Non-linear Linear
Scatterplot and Variable Relationships
1000 1000
750 750
$ for Ipad
$ for Ipad
500 500
250 250
0
0
0 1 2 3 4 5 6
0 1 2 3 4 5 6
# of Apple products owned
Positive Relationship Negative Relationship

1000
750
$ for Ipad
500
250
0
0 1 2 3 4 5 6
# of Apple Products Owned
Unclear (non-monotonic)
Tools to Analyze Linear Relationships
Bivariate: between Y and one X Between Y and more than one X
Simple linear Multiple linear

Correlation
regression regression
• Simple Regression
• Cannot make
• Assume a relationship between X and Y
predictions
• But can be powerful in terms of prediction and
explanation
• Rely on a set of assumptions
Correlation
• Quantifies the degree to which two continuous variables are linearly

related
• The correlation coefficient ranges from -1 to +1
• Two Attributes
• Direction: Are X and Y positively or negatively related?
• Strength: Is the relationship weak, moderate, or strong?
Pearson Correlation – Direction
y r>0 y r<0
x x
Positive: Y increases as X Negative: Y decreases as X

increase increase
• Question: which of the two plots possibly depicts the relationship between
• Sales and price?
• Sales and advertising?
Correlation – Strength
• Determined by the absolute value of r (-1 ≤ r ≤ 1)
• The general guideline of the effect size for r:
• But the thresholds may vary by applications—you should let your business application
determine what should be considered as a strong correlation
• Note: Strength of a correlation coefficient Significance of a correlation coefficient

Correlation Is NOT Causation
A strong correlation between X and Y does NOT imply that X causes Y,

or Y causes X
Significant and positive correlation between exercise and skin cancer

= People can be more likely to get skin cancer if they exercise?
Linear Regression
• With a linear regression, we can quantify the impact of X on Y

and make predictions about Y
1000
750
$ for Ipad
500
250
0
0 1 2 3 4 5 6
Simple Linear Regression
• Only one independent variable, x

• Relationship between x and y is described by a linear function
Slope Random Error,
Intercept or Residual
y  a b x  e
• For any given x, we can predict the y value

y  a b x
Meaning of Coefficients
• Intercept, a
• The estimated average value of y when x is zero

y  a  b0  a
• Slope, b
• The estimated change in the average value of y as a result of a one-unit
change in x

y  b b x  
 b  y ' y
y '  b  b  ( x  1)
Meaning of Coefficients

Residual for this y  b0  b1 x
y x* value
Observed Value of
y for x* Slope =
e
Predicted Value of one unit of x
y for x* ,
Intercept =
0 x* x
Characterize the Relationship
• Three Attributes
• Presence: whether a systematic relationship exists?
• Statistically significance of the slope coefficient
• Direction: positive or negative?

• Sign of the slope coefficient
• Strength: strong, moderate, weak?

• R-squared
Statistical Significance of Coefficients
• For
• R program sets unless specified otherwise
• Test statistic
• follows a distribution with degrees of freedom

Overall Model Significance
• Test of whether or not the linear regression model provides a better fit to a dataset
than a model with no predictor variables
• For
• Test statistic
• follows a distribution with degrees of freedom

Overall Model Fit
• R-squared
• Portion of the total variation in the dependent variable that is explained by the model
• The R-square statistic summarizes the overall model goodness of fit
• R-square is between 0 and 1
• Higher values indicate better fit

In-class practice
• Q1. Using excel program, estimate regression models

• Simple regression: Y - Quantity sold; X - Price
• Q2. Interpret the regression models

• Significance of individual coefficient
• Overall model fit
• Q3. Calculate the square of correlation coefficient (Multiple R) and compare it

to the R2
• Q3. Draw a scatter plot including the fitted values of each observation.
Multiple Regression
y  a  b1  x1  b2  x2  ...  bk  xk  e
• Multiple regression includes more than one predictors

• The interpretation of the model is conceptually very similar to that of a simple
linear regression
• t test
• F test
• R-squared (how much model can explain)
Multiple Regression
y  a  b1  x1  b2  x2  ...  bk  xk  e
• Hypotheses for coefficient b1:

• H0: (no relationship between y and the corresponding independent variable)
• H1: (the relationship does exist)
• The t-test statistic

• If p < 0.05, the slope is significant.
• Interpretation of (when increase by 1 and how much the y change )
• With 1-unit increase in , is expected to increase by (given , …, are constant)
Multiple Regression: F test
• The F-Test in the ANOVA table can still be used to assess the overall significance of the
model
• Shows if there is a linear relationship between y and all x variables considered together
• Hypotheses for the F test
• H0:
(none of the independent variable affects y)
• H1: At least one
(at least one independent variable affects y)
• If p < 0.05, we reject the null hypothesis. In other words, at least one predictor is
significant, but we don’t know which one is significant.
Multiple Regression: R-squared
• The R-squared has the same interpretation as in Simple Linear Regression
• R2 never decreases when a new independent variable is added to the model

• Even if an insignificant independent variable is aded to the model, the R 2 would still increase
• As a result, R2 would always favor more models with more independent variables
Multiple Regression: Adjusted R-square
 n 1 
R 2
Adjusted  1  1  R  
2

 n  k 1 
• Adjusted R2 shows the proportion of variation in the DV explained by all independent

variables adjusted for the number of independent variables (k) used
• Penalize excessive use of unimportant independent variables
Multiple Regression: Adjusted R-square
Model 1 Model 2
R-square 0.20 < 0.21
Adjusted
R-square 0.18 > 0.17
• Use the adjusted R-square and conclude that Model 1 has a better fit after
adjusting for the number of predictors
In-class practice
• Q1. Using excel program, estimate regression models

• Multiple regression: Y - Quantity sold; X - Price and Advertising
• Q2. Interpret the regression models

• Significance of individual coefficient
• Significance of overall model (multiple regression only)
• Q3. Draw a scatter plot including the fitted values of each observation.
Statistical vs. Managerial Significance
• What statisticians care about may be different from what managers care about
• Statistical significance is a “bar” to cross
• Managerial significance is what really matters
• You need to put the managerial significance before the statistical significance
R program Practice
Data Description
• Module4_Amusement_Survey.csv
• Hypothetical survey data of visitors to an amusement park

• Weekend: whether the respondent visited during weekend
• Num.child: number of children brought
• Distance: Distance to the park
• Satisfaction scores (0 – 100)
• Rides, games, waiting time, cleanliness
• Overall
Data Description
• head(survey)
• summary(survey)
• str(survey)
Preliminary Data Analysis
• Scatterplot
• scatterplotMatrix(survey)
• All variables need to be numeric
• Relationship between two variables
• Histogram
• Findings
• Univariate distribution: Diagonal plots
• The linear regression analysis requires all (continuous)
variables to be multivariate normal
• Distance variable is highly skewed
• Log-transform the skewed variable
• survey$log.dist = log(survey$dist)
• Note: Log-transformation is applicable only
for continuous variables
• Scatterplot
• Findings
• Bivariate relationship: Off-Diagonal plots
• Elliptical shape
• High correlation between two variables
• Satisfaction scores shows the high
correlations
• Possible multicollinearity problem
among independent variables
• Bivariate plots
• plot(overall~weekend, data=survey, xlab="Satisfaction with Rides", ylab="Overall
Satisfaction")
• plot(overall~num.child, data=survey, xlab="Satisfaction with Rides", ylab="Overall
Satisfaction")
• plot(overall~distances, data=survey, xlab="Satisfaction with Rides", ylab="Overall
Satisfaction")
• Correlation matrix
• cor(survey[ , c(2, 4:9)])
• corrplot.mixed(cor(survey[ , c(2, 4:9)]), upper="ellipse")
• Findings
• Correlation between overall (dependent variable) and independent variables
• Positively correlated
• Elliptical shape and blue color
• Correlation among independent variables
• Positively correlated
• Clean and rides, clean and games, rides and games
• Pearson correlation and significance testing
• cor.test(survey$num.child, survey$distance)
• corr.test(survey[ , c(2, 4:9)])
• Findings
• Overall satisfaction and predictor variables shows positive and significant correlations
• Some predictor variables show significantly high positive correlations
• Clean & ride
In-class practice
• Estimate the following regression models and interpret the models.

Estimating Simple Linear Regression
• Linear regression estimation and result

• m1 <- lm(overall ~ rides, data=survey)
• summary(m1)
• Estimation result
• The coefficient is significant.

• When satisfaction increases by 1, overall
satisfaction score increases by 1.703
• The overall linear model is also significant
• The model that has rides score for the
independent variable explains 34% of the
variations of overall score
• Plot the estimated linear regression line

• plot(overall~rides, data=survey,
xlab="Satisfaction with Rides", ylab="Overall
Satisfaction")
• abline(m1, col='blue’)
• The plot shows the difference between the

observed value and the predicted value of overall
based on the estimated linear regression model
Simple Linear Regression Objects
• List of important regression objects
• m1$coefficients
• Estimated linear regression coefficients
• m1$residuals
• m1$fitted.values
• Computing R^2
• cor(survey$overall, survey$rides)^2
• Only for the simple linear regression
Prediction
• You can find the predicted overall satisfaction score from the linear regression
model object
• m1$fitted.values
• m1$fitted.values[10]
• However, you can also compute the prediction by hand

• m1$coefficients[1] + 10*m1$coefficients[2]
• -94.9622 + 10 * 1.7033
Estimating Multiple Linear Regression
• Linear regression estimation and results

• m2 <- lm(overall ~ rides + games + wait + clean,
data=survey)
• summary(m2)
• Estimation results
• All coefficients are significant at 0.05 level

• The overall model is significant
• About 56% of the variation in the overall
satisfaction is explained by variations in the
predictor variables
In-class practice
• Estimate the following regression models and interpret the models.

• Estimated coefficient importance
• coefplot(m2, intercept=FALSE, outerCI=1.96, lwdOuter=1.5,
ylab="Rating of Feature", xlab="Association with Overall
Satisfaction")
• Importance: Clean > wait > rides > games
• Prediction
• predict(m2, survey[1:10,])
• fitted(m2)[1:10]
• m2$fitted.values[10]
• m2$coefficients[1] +
• 100*m2$coefficients[2] +
100*m2$coefficients[3] +
100*m2$coefficients[4] +
100*m2$coefficients[5]
• coef(m2)["(Intercept)"] + coef(m2)["rides"]*100 + coef(m2)
["games"]*100 + coef(m2)["wait"]*100 + coef(m2)
["clean"]*100

Module5_Marketing_Mix_Model_1

Uploaded by

Copyright:

Available Formats

You might also like

Module5_Marketing_Mix_Model_1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Module5_Marketing_Mix_Model_1

Uploaded by

Copyright:

Available Formats

MKTG 631

• Is the willingness to pay for iPhone related to number of apple products

• How does product price affect sales?

• If a person’s stated importance of going to movie theatre is higher, will she

• Which scatterplot describes a linear relationship?

Positive Relationship Negative Relationship

Bivariate: between Y and one X Between Y and more than one X

Simple linear Multiple linear

• Quantifies the degree to which two continuous variables are linearly

• The correlation coefficient ranges from -1 to +1

Positive: Y increases as X Negative: Y decreases as X

• Note: Strength of a correlation coefficient Significance of a correlation coefficient

A strong correlation between X and Y does NOT imply that X causes Y,

Significant and positive correlation between exercise and skin cancer

• With a linear regression, we can quantify the impact of X on Y

• Only one independent variable, x

• Direction: positive or negative?

• Strength: strong, moderate, weak?

• R program sets unless specified otherwise

• follows a distribution with degrees of freedom

• follows a distribution with degrees of freedom

• The R-square statistic summarizes the overall model goodness of fit

• R-square is between 0 and 1

• Higher values indicate better fit

• Q1. Using excel program, estimate regression models

• Q2. Interpret the regression models

• Q3. Calculate the square of correlation coefficient (Multiple R) and compare it

• Multiple regression includes more than one predictors

• Hypotheses for coefficient b1:

• The t-test statistic

• The R-squared has the same interpretation as in Simple Linear Regression

• R2 never decreases when a new independent variable is added to the model

• Adjusted R2 shows the proportion of variation in the DV explained by all independent

R-square 0.20 < 0.21

• Q1. Using excel program, estimate regression models

• Q2. Interpret the regression models

• Statistical significance is a “bar” to cross

• Managerial significance is what really matters

• Hypothetical survey data of visitors to an amusement park

• Estimate the following regression models and interpret the models.

• Linear regression estimation and result

• The coefficient is significant.

• Plot the estimated linear regression line

• The plot shows the difference between the

• However, you can also compute the prediction by hand

• Linear regression estimation and results

• All coefficients are significant at 0.05 level

• Estimate the following regression models and interpret the models.

You might also like