Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 10

Relationship Marketing: User-generated Content (Solution)

2. Brand sentiment analysis at UnterEats


You are asked to implement a brand sentiment analysis for UnterEats, a new meal delivery
service that is trying to meet the growing demand for home delivery. The company prides
itself on providing an online platform that is easy to navigate, fast delivery times, and
efficient customer service. One weakness of the company’s profile is a smaller network of
restaurants that are available on the platform relative to other similar services. UnterEats
has experienced an erosion in their market-share over the 5 years of their existence and
they are trying to diagnose whether consumers dislike certain aspects of their product.
As a basis for your analysis, you have access to historic social media data from UnterEats.
The company set up a data-base that collects the 10,000 most liked tweets each year and
stores the content of those tweets. UnterEats has also already implemented a text analysis
that assigns words to a set of 4 broad sentiments that are being expressed in the tweet
(excitement/novelty, content/happy, disappointment, neutral tone / indifference). Based
on the words in a tweet their algorithm then assigns a score between 1 and 10 that
represent how strongly the content of the tweet is associated with each sentiment (based
on how many words appear in the text to are assigned to each sentiment).
Second, the algorithm also detects whether the tweet mentions specific attributes of
UnterEats. These are coded up as 0/1 variables (dummy variables) that indicate whether a
specific feature was mentioned (multiple feature might be mentioned in the same tweet).
The data contains information on five product features: online platform, delivery speed,
customer service, price, restaurant network.
Finally, in the first year, the company also had some of their marketing team manually rate
whether a tweet was overall positive in tone. Similar to the sentiment variables,
brand_sentiment is measured between 1 and 10 (1=very negative tone, 10=very positive
tone). This variable is only available for 1,000 tweets in the first year of data collection
(because manual coding is expensive). However, the company would ideally like to use this
variable as their preferred measure of overall brand sentiment.
You are given access to five years of data and company wants you to implement the
following analysis:
1) How did brand sentiment evolve over time? Does the pattern of brand sentiment
match the loss in market share that UnterEats experienced?

2) How do consumers currently feel about different features of the UnterEat service?

3) Did consumer attitudes with regards to specific features change over time? Could this
change in attitude explain the decrease in UnterEat’s market share?
2.1 First look at the data
The code below loads the data (you need to switch to your working directory before
running the code). Have a look at the data set and make sure you understand what the
various variables represent (they are all explained in the text above).
Scroll down in the spreadsheet to row 1,000. You will notice (as explained above) that the
brand_sentiment variable in only available for the first 1,000 rows. These are the 1,000
tweets for which the variable was manually coded up.
library(readr)
sentiment <- read_csv("brand_sentiment_analysis.csv")

## Parsed with column specification:


## cols(
## tweet_id = col_double(),
## year = col_double(),
## brand_sentiment = col_double(),
## excited = col_double(),
## content = col_double(),
## disappointed = col_double(),
## neutral = col_double(),
## online_platform = col_double(),
## delivery_speed = col_double(),
## price = col_double(),
## network = col_double(),
## customer_service = col_double()
## )

2.2 What determines brand sentiment.


In the first step of your analysis, you devise a way to assign a brand sentiment value for
tweets for which this information has not been coded up. In order to do so you start by
regressing brand_sentiment on the 4 individual sentiment variables provided.
NOTE: Although we use the same regression code as before, this regression will only use
the first 1,000 row. For the remaining rows, the outcome variable does not exist and hence
these rows cannot be used in the regression.
sentiment_reg <- lm(brand_sentiment ~ excited + content + neutral +
disappointed, data=sentiment)

summary(sentiment_reg)

##
## Call:
## lm(formula = brand_sentiment ~ excited + content + neutral +
## disappointed, data = sentiment)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.80963 -0.25527 0.01035 0.25550 0.70011
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.013536 0.038413 182.58 <2e-16 ***
## excited 0.350185 0.004862 72.03 <2e-16 ***
## content 0.148169 0.006844 21.65 <2e-16 ***
## neutral -0.084480 0.003461 -24.41 <2e-16 ***
## disappointed -0.287481 0.003417 -84.14 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3108 on 995 degrees of freedom
## (49000 observations deleted due to missingness)
## Multiple R-squared: 0.9592, Adjusted R-squared: 0.959
## F-statistic: 5847 on 4 and 995 DF, p-value: < 2.2e-16

Interpret all regression coefficients. Does the sign and magnitude of the coefficients
appear intuitively correct?
ANSWER: The intercept represent brand sentiment when all four variables are equal to
zero. Each of the 4 slope coefficients measure the impact of a one unit change in the
respective variable on overall brand sentiment (holding everything else constant). Signs
and magnitudes are intuitive. There are two positive variables which both have a positive
impact, “excited” has a stronger positive impact on brand sentiment than “content”.
“Disappointed” has a negative impact as expected and “neutral” lies between the other
metrics as expected.
2.3 Extrapolating brand sentiment.
You notice that the sentiment regression has a high r-squared value and therefore
knowledge of the 4 sentiment variables allows you predict the overall brand sentiment
variable with a high degree of confidence. In a second step you compute the predicted
brand sentiment variable for the entire data (i.e. including tweets for which brand
sentiment was not measured).
sentiment$brand_sentiment_fitted <- predict(sentiment_reg, newdata=sentiment)

Have a look at the newly generated variable inside the sentiment data-frame. Explain
how the brand_sentiment_fitted variable is compute. For the tweet in row 1001 of
the data, implement a calculation that derives the predicted brand_sentiment value
for this tweet.
ANSWER: We can obtain predictions by plugging the values for all 4 variables into the
regression formula based on the regression output from question 1.2.
For row 1001, this calculation is as follows:
Brand_sentiment_fitted = 7.013536 + 0.350185 * 6 + 0.148169 * 6 - 0.084480 * 8 -
0.287481 * 8 = 7.027969

2.4 Brand sentiment: time trends.


We now have a measure of brand sentiment for all tweets in our data. In a first step of the
actual analysis, you regress brand_sentiment_fitted on dummies for years 2,3,4,5 (year
one is left out for reasons that should become clear).
NOTE: the factor command below automatically generates dummies for years 2,3,4,5. I.e.
factor(year)2 is a dummy variable that is equal to 1 if the tweets is from year 2 and
similarly for the other variables in the regression.
summary(lm(brand_sentiment_fitted ~ factor(year), data=sentiment))

##
## Call:
## lm(formula = brand_sentiment_fitted ~ factor(year), data = sentiment)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.8920 -1.1071 0.0025 1.1085 3.9741
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.33414 0.01507 486.52 <2e-16 ***
## factor(year)2 -0.46779 0.02132 -21.94 <2e-16 ***
## factor(year)3 -0.88183 0.02132 -41.36 <2e-16 ***
## factor(year)4 -1.28014 0.02132 -60.05 <2e-16 ***
## factor(year)5 -1.71589 0.02132 -80.49 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.507 on 49995 degrees of freedom
## Multiple R-squared: 0.1369, Adjusted R-squared: 0.1369
## F-statistic: 1983 on 4 and 49995 DF, p-value: < 2.2e-16

Interpret all regression coefficients. What do they tell you about the evolution of
brand sentiment over time?
ANSWER:
The intercept represent brand sentiment in year 1. Each of the other coefficients represent
the difference in brand sentiment in a given year relative to year 1. For example the
factor(year)4 coefficient tells us how much lower brand sentiment is in year 4 relative to
year 1.
The regression output shows a constant (almost linear) decline in brand sentiment over
time.

2.5 Brand sentiment: determinants.


Next, you try to understand which particular characteristics of the company determine the
sentiment consumers express in their tweets. For this analysis you focus on the most
recent year (year 5) in the data. You regress brand_sentiment_fitted on all 5 features
provided in the data.
summary(lm(brand_sentiment_fitted ~ online_platform + delivery_speed + price
+ network + customer_service, data=sentiment, subset=year==5))

##
## Call:
## lm(formula = brand_sentiment_fitted ~ online_platform + delivery_speed +
## price + network + customer_service, data = sentiment, subset = year ==

## 5)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.02082 -0.47439 0.03021 0.44018 2.13083
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.644929 0.009518 593.07 <2e-16 ***
## online_platform 0.663949 0.035850 18.52 <2e-16 ***
## delivery_speed 0.806275 0.022102 36.48 <2e-16 ***
## price 1.117942 0.020647 54.15 <2e-16 ***
## network -0.704435 0.024158 -29.16 <2e-16 ***
## customer_service -1.591611 0.016471 -96.63 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6222 on 9994 degrees of freedom
## Multiple R-squared: 0.8309, Adjusted R-squared: 0.8308
## F-statistic: 9820 on 5 and 9994 DF, p-value: < 2.2e-16

Interpret all regression regression coefficients. Based on this analysis, which


characteristics of UnterEats do consumer appear to like or dislike?
ANSWER:
Remember that all 5 variables here are dummy variables, i.e. they only take on values 0 or
1. The intercept represent the average sentiment of a tweet in which none of the 5 product
characteristics is mentioned. The other 5 coefficients capture the difference in brand
sentiment when a particular feature is mentioned (holding everything else constant).
Consumers seem to like the price level, the online platform, and delivery speed (to different
degrees) and dislike the small network of restaurants and customer service.

2.6 Brand sentiment: determinants over time


In a final step you are trying to relate the two previous pieces of analysis to each other. In
particular, you are trying to figure whether consumers’ feelings about specific
characteristics might have deteriorated over time, which in turn led to a decrease in overall
brand sentiment. To this end you implement the regression you ran in (##1.5), but now
implement the same regression for each year in the data.
summary(lm(brand_sentiment_fitted ~ online_platform + delivery_speed + price
+ network + customer_service, data=sentiment, subset=year==1))

##
## Call:
## lm(formula = brand_sentiment_fitted ~ online_platform + delivery_speed +
## price + network + customer_service, data = sentiment, subset = year ==

## 1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.5088 -0.4755 0.0161 0.4927 3.2424
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.69511 0.01339 425.458 <2e-16 ***
## online_platform 1.08412 0.02063 52.547 <2e-16 ***
## delivery_speed 1.11426 0.02283 48.802 <2e-16 ***
## price 1.22316 0.02321 52.709 <2e-16 ***
## network -1.12400 0.08664 -12.974 <2e-16 ***
## customer_service -0.04239 0.02060 -2.058 0.0396 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7468 on 9994 degrees of freedom
## Multiple R-squared: 0.7526, Adjusted R-squared: 0.7525
## F-statistic: 6082 on 5 and 9994 DF, p-value: < 2.2e-16

summary(lm(brand_sentiment_fitted ~ online_platform + delivery_speed + price


+ network + customer_service, data=sentiment, subset=year==2))

##
## Call:
## lm(formula = brand_sentiment_fitted ~ online_platform + delivery_speed +
## price + network + customer_service, data = sentiment, subset = year ==

## 2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.64660 -0.46934 0.03274 0.49815 2.77959
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.67482 0.01173 483.85 <2e-16 ***
## online_platform 0.95815 0.02302 41.62 <2e-16 ***
## delivery_speed 1.02601 0.02186 46.93 <2e-16 ***
## price 1.21167 0.02181 55.56 <2e-16 ***
## network -0.89795 0.05594 -16.05 <2e-16 ***
## customer_service -0.96529 0.03089 -31.25 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7197 on 9994 degrees of freedom
## Multiple R-squared: 0.7719, Adjusted R-squared: 0.7718
## F-statistic: 6763 on 5 and 9994 DF, p-value: < 2.2e-16

summary(lm(brand_sentiment_fitted ~ online_platform + delivery_speed + price


+ network + customer_service, data=sentiment, subset=year==3))

##
## Call:
## lm(formula = brand_sentiment_fitted ~ online_platform + delivery_speed +
## price + network + customer_service, data = sentiment, subset = year ==
## 3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.2848 -0.4381 0.0287 0.4448 2.3425
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.68117 0.01034 549.18 <2e-16 ***
## online_platform 0.89522 0.02519 35.54 <2e-16 ***
## delivery_speed 0.94506 0.02058 45.91 <2e-16 ***
## price 1.17022 0.02009 58.25 <2e-16 ***
## network -0.61824 0.04057 -15.24 <2e-16 ***
## customer_service -1.35446 0.02237 -60.55 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.661 on 9994 degrees of freedom
## Multiple R-squared: 0.8068, Adjusted R-squared: 0.8067
## F-statistic: 8346 on 5 and 9994 DF, p-value: < 2.2e-16

summary(lm(brand_sentiment_fitted ~ online_platform + delivery_speed + price


+ network + customer_service, data=sentiment, subset=year==4))

##
## Call:
## lm(formula = brand_sentiment_fitted ~ online_platform + delivery_speed +
## price + network + customer_service, data = sentiment, subset = year ==

## 4)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.19937 -0.45132 0.01812 0.43233 2.21224
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.681612 0.009701 585.65 <2e-16 ***
## online_platform 0.754776 0.029574 25.52 <2e-16 ***
## delivery_speed 0.914528 0.020785 44.00 <2e-16 ***
## price 1.119655 0.019957 56.10 <2e-16 ***
## network -0.606291 0.030151 -20.11 <2e-16 ***
## customer_service -1.505794 0.018287 -82.34 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6293 on 9994 degrees of freedom
## Multiple R-squared: 0.8272, Adjusted R-squared: 0.8272
## F-statistic: 9571 on 5 and 9994 DF, p-value: < 2.2e-16
summary(lm(brand_sentiment_fitted ~ online_platform + delivery_speed + price
+ network + customer_service, data=sentiment, subset=year==5))

##
## Call:
## lm(formula = brand_sentiment_fitted ~ online_platform + delivery_speed +
## price + network + customer_service, data = sentiment, subset = year ==

## 5)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.02082 -0.47439 0.03021 0.44018 2.13083
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.644929 0.009518 593.07 <2e-16 ***
## online_platform 0.663949 0.035850 18.52 <2e-16 ***
## delivery_speed 0.806275 0.022102 36.48 <2e-16 ***
## price 1.117942 0.020647 54.15 <2e-16 ***
## network -0.704435 0.024158 -29.16 <2e-16 ***
## customer_service -1.591611 0.016471 -96.63 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6222 on 9994 degrees of freedom
## Multiple R-squared: 0.8309, Adjusted R-squared: 0.8308
## F-statistic: 9820 on 5 and 9994 DF, p-value: < 2.2e-16

Based on the regression, does it appear that consumers’ evaluation of specific brand
characteristics changed systematically over time?
ANSWER:
Most coefficients stay stable over time, with the exception of customer service. It appears
that tweets about customer service are becoming increasingly negative over time and
hence a deterioration of the perceived level of customer service is most likely the
underlying cause of the decline in brand sentiment.
While customers seem to dislike the small network of restaurants, consumers’ attitude
about this aspect of the company has not changed over time.

2.7 Putting it all together


How would you present the results from the analysis above to UnterEats? What do
you see as the key take-aways from the analysis? What recommendations do you
have with regards to the overall strategy of UnterEats based on your analysis?
ANSWER:
We can take away various high-level points. Consumer are very happy about the price of
the service (price has the largest positive coefficient), they dislike the small network and
they strongly dislike the quality of customer service. The latter declined over time and
explains the overall decline in brand sentiment.
First order of business would be to figure out what the company used to do better 5 years
ago relative to now and see if the service level can be increased back to where it was at.
Even if elevating the customer service aspect is costly, it might be possible to compensate
for the cost increase by charging a higher price. Given consumers’ positive attitude about
the current price level, there is probably some room for adjustment.

You might also like