Module5_Marketing_Mix_Model_1

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 43

MKTG 631

Marketing-Based Analytics
Module 5– Marketing Mix Model I

Jinhee Huh
Linear Regression
Linear Regression Examples

• Is the willingness to pay for iPhone related to number of apple products


already owned?

• How does product price affect sales?

• If a person’s stated importance of going to movie theatre is higher, will she


go to the movie theatre more often?
Scatterplot and Variable Relationships

• Which scatterplot describes a linear relationship?

y y

x x

Non-linear Linear
Scatterplot and Variable Relationships
1000 1000

750 750

$ for Ipad
$ for Ipad

500 500

250 250

0
0
0 1 2 3 4 5 6
0 1 2 3 4 5 6
# of Apple products owned
# of Apple products owned

Positive Relationship Negative Relationship


1000

750
$ for Ipad

500

250

0
0 1 2 3 4 5 6
# of Apple Products Owned

Unclear (non-monotonic)
Tools to Analyze Linear Relationships

Bivariate: between Y and one X Between Y and more than one X

Simple linear Multiple linear


Correlation
regression regression

• Simple Regression
• Cannot make
• Assume a relationship between X and Y
predictions
• But can be powerful in terms of prediction and
explanation
• Rely on a set of assumptions
Correlation

• Quantifies the degree to which two continuous variables are linearly


related

• The correlation coefficient ranges from -1 to +1

• Two Attributes
• Direction: Are X and Y positively or negatively related?
• Strength: Is the relationship weak, moderate, or strong?
Pearson Correlation – Direction
y r>0 y r<0

x x

Positive: Y increases as X Negative: Y decreases as X


increase increase

• Question: which of the two plots possibly depicts the relationship between
• Sales and price?
• Sales and advertising?
Correlation – Strength
• Determined by the absolute value of r (-1 ≤ r ≤ 1)
• The general guideline of the effect size for r:

• But the thresholds may vary by applications—you should let your business application
determine what should be considered as a strong correlation

• Note: Strength of a correlation coefficient Significance of a correlation coefficient


Correlation Is NOT Causation

A strong correlation between X and Y does NOT imply that X causes Y,


or Y causes X

Significant and positive correlation between exercise and skin cancer


= People can be more likely to get skin cancer if they exercise?
Linear Regression

• With a linear regression, we can quantify the impact of X on Y


and make predictions about Y

1000

750
$ for Ipad

500

250

0
0 1 2 3 4 5 6
# of Apple products owned
Simple Linear Regression

• Only one independent variable, x


• Relationship between x and y is described by a linear function
Slope Random Error,
Intercept or Residual

y  a b x  e
• For any given x, we can predict the y value

y  a b x
Meaning of Coefficients

• Intercept, a
• The estimated average value of y when x is zero

y  a  b0  a
• Slope, b
• The estimated change in the average value of y as a result of a one-unit
change in x

y  b b x  
 b  y ' y
y '  b  b  ( x  1)
Meaning of Coefficients

Residual for this y  b0  b1 x
y x* value

Observed Value of
y for x* Slope =
e
Predicted Value of one unit of x
y for x* ,

Intercept =

0 x* x
Characterize the Relationship

• Three Attributes
• Presence: whether a systematic relationship exists?
• Statistically significance of the slope coefficient

• Direction: positive or negative?


• Sign of the slope coefficient

• Strength: strong, moderate, weak?


• R-squared
Statistical Significance of Coefficients

• For

• R program sets unless specified otherwise

• Test statistic

• follows a distribution with degrees of freedom


Overall Model Significance
• Test of whether or not the linear regression model provides a better fit to a dataset
than a model with no predictor variables

• For

• Test statistic

• follows a distribution with degrees of freedom


Overall Model Fit

• R-squared
• Portion of the total variation in the dependent variable that is explained by the model

• The R-square statistic summarizes the overall model goodness of fit

• R-square is between 0 and 1

• Higher values indicate better fit


In-class practice

• Q1. Using excel program, estimate regression models


• Simple regression: Y - Quantity sold; X - Price

• Q2. Interpret the regression models


• Significance of individual coefficient
• Overall model fit

• Q3. Calculate the square of correlation coefficient (Multiple R) and compare it


to the R2

• Q3. Draw a scatter plot including the fitted values of each observation.
Multiple Regression
y  a  b1  x1  b2  x2  ...  bk  xk  e

• Multiple regression includes more than one predictors


• The interpretation of the model is conceptually very similar to that of a simple
linear regression
• Overall model fit
• t test
• F test
• R-squared (how much model can explain)
Multiple Regression
y  a  b1  x1  b2  x2  ...  bk  xk  e

• Hypotheses for coefficient b1:


• H0: (no relationship between y and the corresponding independent variable)
• H1: (the relationship does exist)

• The t-test statistic


• If p < 0.05, the slope is significant.
• Interpretation of (when increase by 1 and how much the y change )
• With 1-unit increase in , is expected to increase by (given , …, are constant)
Multiple Regression: F test

• The F-Test in the ANOVA table can still be used to assess the overall significance of the
model
• Shows if there is a linear relationship between y and all x variables considered together
• Hypotheses for the F test
• H0:
(none of the independent variable affects y)
• H1: At least one
(at least one independent variable affects y)

• If p < 0.05, we reject the null hypothesis. In other words, at least one predictor is
significant, but we don’t know which one is significant.
Multiple Regression: R-squared

• The R-squared has the same interpretation as in Simple Linear Regression

• R2 never decreases when a new independent variable is added to the model


• Even if an insignificant independent variable is aded to the model, the R 2 would still increase
• As a result, R2 would always favor more models with more independent variables
Multiple Regression: Adjusted R-square

 n 1 
R 2
Adjusted  1  1  R  
2

 n  k 1 

• Adjusted R2 shows the proportion of variation in the DV explained by all independent


variables adjusted for the number of independent variables (k) used
• Penalize excessive use of unimportant independent variables
Multiple Regression: Adjusted R-square

Model 1 Model 2

R-square 0.20 < 0.21

Adjusted
R-square 0.18 > 0.17

• Use the adjusted R-square and conclude that Model 1 has a better fit after
adjusting for the number of predictors
In-class practice

• Q1. Using excel program, estimate regression models


• Multiple regression: Y - Quantity sold; X - Price and Advertising

• Q2. Interpret the regression models


• Significance of individual coefficient
• Significance of overall model (multiple regression only)
• Overall model fit

• Q3. Draw a scatter plot including the fitted values of each observation.
Statistical vs. Managerial Significance

• What statisticians care about may be different from what managers care about

• Statistical significance is a “bar” to cross

• Managerial significance is what really matters

• You need to put the managerial significance before the statistical significance
R program Practice
Data Description
• Module4_Amusement_Survey.csv

• Hypothetical survey data of visitors to an amusement park


• Weekend: whether the respondent visited during weekend
• Num.child: number of children brought
• Distance: Distance to the park
• Satisfaction scores (0 – 100)
• Rides, games, waiting time, cleanliness
• Overall
Data Description

• head(survey)

• summary(survey)

• str(survey)
Preliminary Data Analysis
• Scatterplot
• scatterplotMatrix(survey)
• All variables need to be numeric
• Relationship between two variables
• Histogram

• Findings
• Univariate distribution: Diagonal plots
• The linear regression analysis requires all (continuous)
variables to be multivariate normal
• Distance variable is highly skewed
• Log-transform the skewed variable

• survey$log.dist = log(survey$dist)
• Note: Log-transformation is applicable only
for continuous variables
Preliminary Data Analysis
• Scatterplot

• Findings
• Bivariate relationship: Off-Diagonal plots
• Elliptical shape
• High correlation between two variables
• Satisfaction scores shows the high
correlations
• Possible multicollinearity problem
among independent variables
Preliminary Data Analysis
• Bivariate plots
• plot(overall~weekend, data=survey, xlab="Satisfaction with Rides", ylab="Overall
Satisfaction")
• plot(overall~num.child, data=survey, xlab="Satisfaction with Rides", ylab="Overall
Satisfaction")
• plot(overall~distances, data=survey, xlab="Satisfaction with Rides", ylab="Overall
Satisfaction")
Preliminary Data Analysis

• Correlation matrix
• cor(survey[ , c(2, 4:9)])
• corrplot.mixed(cor(survey[ , c(2, 4:9)]), upper="ellipse")

• Findings
• Correlation between overall (dependent variable) and independent variables
• Positively correlated
• Elliptical shape and blue color
• Correlation among independent variables
• Positively correlated
• Clean and rides, clean and games, rides and games
Preliminary Data Analysis
• Pearson correlation and significance testing
• cor.test(survey$num.child, survey$distance)
• corr.test(survey[ , c(2, 4:9)])

• Findings
• Overall satisfaction and predictor variables shows positive and significant correlations
• Some predictor variables show significantly high positive correlations
• Clean & ride
In-class practice

• Estimate the following regression models and interpret the models.


Estimating Simple Linear Regression

• Linear regression estimation and result


• m1 <- lm(overall ~ rides, data=survey)
• summary(m1)

• Estimation result

• The coefficient is significant.


• When satisfaction increases by 1, overall
satisfaction score increases by 1.703
• The overall linear model is also significant
• The model that has rides score for the
independent variable explains 34% of the
variations of overall score
Estimating Simple Linear Regression

• Plot the estimated linear regression line


• plot(overall~rides, data=survey,
xlab="Satisfaction with Rides", ylab="Overall
Satisfaction")
• abline(m1, col='blue’)

• The plot shows the difference between the


observed value and the predicted value of overall
based on the estimated linear regression model
Simple Linear Regression Objects
• List of important regression objects
• m1$coefficients
• Estimated linear regression coefficients
• m1$residuals

• m1$fitted.values

• Computing R^2
• cor(survey$overall, survey$rides)^2
• Only for the simple linear regression
Prediction

• You can find the predicted overall satisfaction score from the linear regression
model object
• m1$fitted.values
• m1$fitted.values[10]

• However, you can also compute the prediction by hand


• m1$coefficients[1] + 10*m1$coefficients[2]
• -94.9622 + 10 * 1.7033
Estimating Multiple Linear Regression

• Linear regression estimation and results


• m2 <- lm(overall ~ rides + games + wait + clean,
data=survey)
• summary(m2)

• Estimation results

• All coefficients are significant at 0.05 level


• The overall model is significant
• About 56% of the variation in the overall
satisfaction is explained by variations in the
predictor variables
In-class practice

• Estimate the following regression models and interpret the models.


Estimating Simple Linear Regression
• Estimated coefficient importance
• coefplot(m2, intercept=FALSE, outerCI=1.96, lwdOuter=1.5,
ylab="Rating of Feature", xlab="Association with Overall
Satisfaction")
• Importance: Clean > wait > rides > games

• Prediction
• predict(m2, survey[1:10,])
• fitted(m2)[1:10]
• m2$fitted.values[10]
• m2$coefficients[1] +
• 100*m2$coefficients[2] +
100*m2$coefficients[3] +
100*m2$coefficients[4] +
100*m2$coefficients[5]
• coef(m2)["(Intercept)"] + coef(m2)["rides"]*100 + coef(m2)
["games"]*100 + coef(m2)["wait"]*100 + coef(m2)
["clean"]*100

You might also like