Professional Documents
Culture Documents
Brand Sentiment Unter Example
Brand Sentiment Unter Example
2) How do consumers currently feel about different features of the UnterEat service?
3) Did consumer attitudes with regards to specific features change over time? Could this
change in attitude explain the decrease in UnterEat’s market share?
2.1 First look at the data
The code below loads the data (you need to switch to your working directory before
running the code). Have a look at the data set and make sure you understand what the
various variables represent (they are all explained in the text above).
Scroll down in the spreadsheet to row 1,000. You will notice (as explained above) that the
brand_sentiment variable in only available for the first 1,000 rows. These are the 1,000
tweets for which the variable was manually coded up.
library(readr)
sentiment <- read_csv("brand_sentiment_analysis.csv")
summary(sentiment_reg)
##
## Call:
## lm(formula = brand_sentiment ~ excited + content + neutral +
## disappointed, data = sentiment)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.80963 -0.25527 0.01035 0.25550 0.70011
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.013536 0.038413 182.58 <2e-16 ***
## excited 0.350185 0.004862 72.03 <2e-16 ***
## content 0.148169 0.006844 21.65 <2e-16 ***
## neutral -0.084480 0.003461 -24.41 <2e-16 ***
## disappointed -0.287481 0.003417 -84.14 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3108 on 995 degrees of freedom
## (49000 observations deleted due to missingness)
## Multiple R-squared: 0.9592, Adjusted R-squared: 0.959
## F-statistic: 5847 on 4 and 995 DF, p-value: < 2.2e-16
Interpret all regression coefficients. Does the sign and magnitude of the coefficients
appear intuitively correct?
ANSWER: The intercept represent brand sentiment when all four variables are equal to
zero. Each of the 4 slope coefficients measure the impact of a one unit change in the
respective variable on overall brand sentiment (holding everything else constant). Signs
and magnitudes are intuitive. There are two positive variables which both have a positive
impact, “excited” has a stronger positive impact on brand sentiment than “content”.
“Disappointed” has a negative impact as expected and “neutral” lies between the other
metrics as expected.
2.3 Extrapolating brand sentiment.
You notice that the sentiment regression has a high r-squared value and therefore
knowledge of the 4 sentiment variables allows you predict the overall brand sentiment
variable with a high degree of confidence. In a second step you compute the predicted
brand sentiment variable for the entire data (i.e. including tweets for which brand
sentiment was not measured).
sentiment$brand_sentiment_fitted <- predict(sentiment_reg, newdata=sentiment)
Have a look at the newly generated variable inside the sentiment data-frame. Explain
how the brand_sentiment_fitted variable is compute. For the tweet in row 1001 of
the data, implement a calculation that derives the predicted brand_sentiment value
for this tweet.
ANSWER: We can obtain predictions by plugging the values for all 4 variables into the
regression formula based on the regression output from question 1.2.
For row 1001, this calculation is as follows:
Brand_sentiment_fitted = 7.013536 + 0.350185 * 6 + 0.148169 * 6 - 0.084480 * 8 -
0.287481 * 8 = 7.027969
##
## Call:
## lm(formula = brand_sentiment_fitted ~ factor(year), data = sentiment)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.8920 -1.1071 0.0025 1.1085 3.9741
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.33414 0.01507 486.52 <2e-16 ***
## factor(year)2 -0.46779 0.02132 -21.94 <2e-16 ***
## factor(year)3 -0.88183 0.02132 -41.36 <2e-16 ***
## factor(year)4 -1.28014 0.02132 -60.05 <2e-16 ***
## factor(year)5 -1.71589 0.02132 -80.49 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.507 on 49995 degrees of freedom
## Multiple R-squared: 0.1369, Adjusted R-squared: 0.1369
## F-statistic: 1983 on 4 and 49995 DF, p-value: < 2.2e-16
Interpret all regression coefficients. What do they tell you about the evolution of
brand sentiment over time?
ANSWER:
The intercept represent brand sentiment in year 1. Each of the other coefficients represent
the difference in brand sentiment in a given year relative to year 1. For example the
factor(year)4 coefficient tells us how much lower brand sentiment is in year 4 relative to
year 1.
The regression output shows a constant (almost linear) decline in brand sentiment over
time.
##
## Call:
## lm(formula = brand_sentiment_fitted ~ online_platform + delivery_speed +
## price + network + customer_service, data = sentiment, subset = year ==
## 5)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.02082 -0.47439 0.03021 0.44018 2.13083
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.644929 0.009518 593.07 <2e-16 ***
## online_platform 0.663949 0.035850 18.52 <2e-16 ***
## delivery_speed 0.806275 0.022102 36.48 <2e-16 ***
## price 1.117942 0.020647 54.15 <2e-16 ***
## network -0.704435 0.024158 -29.16 <2e-16 ***
## customer_service -1.591611 0.016471 -96.63 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6222 on 9994 degrees of freedom
## Multiple R-squared: 0.8309, Adjusted R-squared: 0.8308
## F-statistic: 9820 on 5 and 9994 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = brand_sentiment_fitted ~ online_platform + delivery_speed +
## price + network + customer_service, data = sentiment, subset = year ==
## 1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.5088 -0.4755 0.0161 0.4927 3.2424
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.69511 0.01339 425.458 <2e-16 ***
## online_platform 1.08412 0.02063 52.547 <2e-16 ***
## delivery_speed 1.11426 0.02283 48.802 <2e-16 ***
## price 1.22316 0.02321 52.709 <2e-16 ***
## network -1.12400 0.08664 -12.974 <2e-16 ***
## customer_service -0.04239 0.02060 -2.058 0.0396 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7468 on 9994 degrees of freedom
## Multiple R-squared: 0.7526, Adjusted R-squared: 0.7525
## F-statistic: 6082 on 5 and 9994 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = brand_sentiment_fitted ~ online_platform + delivery_speed +
## price + network + customer_service, data = sentiment, subset = year ==
## 2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.64660 -0.46934 0.03274 0.49815 2.77959
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.67482 0.01173 483.85 <2e-16 ***
## online_platform 0.95815 0.02302 41.62 <2e-16 ***
## delivery_speed 1.02601 0.02186 46.93 <2e-16 ***
## price 1.21167 0.02181 55.56 <2e-16 ***
## network -0.89795 0.05594 -16.05 <2e-16 ***
## customer_service -0.96529 0.03089 -31.25 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7197 on 9994 degrees of freedom
## Multiple R-squared: 0.7719, Adjusted R-squared: 0.7718
## F-statistic: 6763 on 5 and 9994 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = brand_sentiment_fitted ~ online_platform + delivery_speed +
## price + network + customer_service, data = sentiment, subset = year ==
## 3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.2848 -0.4381 0.0287 0.4448 2.3425
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.68117 0.01034 549.18 <2e-16 ***
## online_platform 0.89522 0.02519 35.54 <2e-16 ***
## delivery_speed 0.94506 0.02058 45.91 <2e-16 ***
## price 1.17022 0.02009 58.25 <2e-16 ***
## network -0.61824 0.04057 -15.24 <2e-16 ***
## customer_service -1.35446 0.02237 -60.55 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.661 on 9994 degrees of freedom
## Multiple R-squared: 0.8068, Adjusted R-squared: 0.8067
## F-statistic: 8346 on 5 and 9994 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = brand_sentiment_fitted ~ online_platform + delivery_speed +
## price + network + customer_service, data = sentiment, subset = year ==
## 4)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.19937 -0.45132 0.01812 0.43233 2.21224
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.681612 0.009701 585.65 <2e-16 ***
## online_platform 0.754776 0.029574 25.52 <2e-16 ***
## delivery_speed 0.914528 0.020785 44.00 <2e-16 ***
## price 1.119655 0.019957 56.10 <2e-16 ***
## network -0.606291 0.030151 -20.11 <2e-16 ***
## customer_service -1.505794 0.018287 -82.34 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6293 on 9994 degrees of freedom
## Multiple R-squared: 0.8272, Adjusted R-squared: 0.8272
## F-statistic: 9571 on 5 and 9994 DF, p-value: < 2.2e-16
summary(lm(brand_sentiment_fitted ~ online_platform + delivery_speed + price
+ network + customer_service, data=sentiment, subset=year==5))
##
## Call:
## lm(formula = brand_sentiment_fitted ~ online_platform + delivery_speed +
## price + network + customer_service, data = sentiment, subset = year ==
## 5)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.02082 -0.47439 0.03021 0.44018 2.13083
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.644929 0.009518 593.07 <2e-16 ***
## online_platform 0.663949 0.035850 18.52 <2e-16 ***
## delivery_speed 0.806275 0.022102 36.48 <2e-16 ***
## price 1.117942 0.020647 54.15 <2e-16 ***
## network -0.704435 0.024158 -29.16 <2e-16 ***
## customer_service -1.591611 0.016471 -96.63 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6222 on 9994 degrees of freedom
## Multiple R-squared: 0.8309, Adjusted R-squared: 0.8308
## F-statistic: 9820 on 5 and 9994 DF, p-value: < 2.2e-16
Based on the regression, does it appear that consumers’ evaluation of specific brand
characteristics changed systematically over time?
ANSWER:
Most coefficients stay stable over time, with the exception of customer service. It appears
that tweets about customer service are becoming increasingly negative over time and
hence a deterioration of the perceived level of customer service is most likely the
underlying cause of the decline in brand sentiment.
While customers seem to dislike the small network of restaurants, consumers’ attitude
about this aspect of the company has not changed over time.