Professional Documents
Culture Documents
Unit II SMA
Unit II SMA
Unit II SMA
• We came across the word 'twitter', and it was just perfect. The
definition was 'a short burst of inconsequential information,' and
'chirps from birds'. And that's exactly what the product was
Twitter and its importance
• The people can send their SMSs to the whole world.
• By sharing on Twitter, a user can easily express his/her opinion for just about
everything, and at any time.
• The followers, immediately get the information about what's going on in someone's life.
Twitter vocabulary
• Twitter allows its users to express their views/sentiments through
an Internet SMS, called "tweets" in the context of Twitter.
• The metadata associated with the tweets are classified by entities and places
• The entities consist of hashtags, URLs, and other media data that users have
included in their tweets.
• The places are nothing but the locations from which the tweets originate.
• URL: http://t.co/pTBlWzTvVd
• The interface (web or mobile) on which the tweets are displayed is called the
timeline.
• tweets are, in general, arranged in chronological order of
posting time.
• All the users tweeting during some public events of widespread interest,
such as presidential debates, can achieve volumes of several hundreds of
thousands of tweets per minute.
• The ROAuth package is the one we are going to use in our experiments
• These tokens allow users to authorize third-party apps to access the data
from any user account without the need to have their passwords
Creating a new app
• The first step toward getting any kind of token access from Twitter is to
create an app on it.
• Having logged in using your credentials, the step for creating an app are as
follows:
• Go to https://apps.twitter.com/app/new.
• Put the name of your application in the Name field. This name can be
anything you like.
• The Website field needs to be filled with a valid URL, but, again, that can be
any random URL.
• Click on it and you will be provided with an Access Token and Access Token
Secret value.
• Before using the preceding keys, we need to install twitteR to access the data in R
using the app we just created. Use the following code:
• library(devtools)
• install_github("geoffjentry/twitteR").
• library(twitteR)
Four special types of information to get the authorization token:
• Key
• Secret
• Access token
• api_key<- "your_api_key"
• api_secret<- "your_api_secret"
• access_token<- "your_access_token"
• access_token_secret<- "your_access_token_secret"
• setup_twitter_oauth(api_key,api_secret,access_token,access_token_secret)
• head(EarthQuakeTweets,2)
• [[1]]
• [1] "TamamiJapan: RT @HistoricalPics: Japan. Top: One Month After Hiroshima, 1945. Bottom:
One Month After The Earthquake and Tsunami, 2011. Incredible. http…"
• [[2]]
• [1] "OldhamDs: RT @HistoricalPics: Japan. Top: One Month After Hiroshima, 1945. Bottom: One
Month After The Earthquake and Tsunami, 2011. Incredible. http…"
Finding trending topics
trends = getTrends(woeid=woeidDelhi)
• The function availableTrendLocations() returns R dataframe containing the
name, country, and woeid parameters.
• We then filter this data frame for a location of our choosing; in this
example, its Delhi, India.
• head(trends)
• 1 #AntiHinduNGOsExposed http://twitter.com/search?q=
%23AntiHinduNGOsExposed %23AntiHinduNGOsExposed 20070458
• This function will return tweets containing the searched string, along with the other
constraints.
• since/until: This constraints the tweets to be since the given date or until the given date
• geocode: This constraints tweets to be from only those users who are located within a
certain distance from the given latitude/longitude
Twitter sentiment analysis
• Why would a company want to invest time and effort in analyzing sentiments of the
posts.
• Analyzing each post and understanding the sentiment associated with that post
helps us find out which are the key topics or themes which resonate well with the
audience.
• If the sentiment around the post is very positive, then people want to talk about the
topic in that post.
• analyzing the sentiment of a company over a period could help us relate its sales
data with the overall sentiment.
Addressing questions
• Thereby, resulting in the decline in sales during that period?
• Was there a huge spike in positive sentiment because a celebrity talked about
company’s product?
• Understanding the posts with negative sentiment could help us find the common
themes in these posts?
• Is customer service a common topic among posts which have high negative emotion?
Challenges in performing sentiment analysis on twitter
tweets
• # Authonitical keys
• head(tweets.df)
• The field ‘text’ contains the tweet part, hashtags, and URLs.
• We need to remove hashtags and URLs from the text field so that we are left only
with the main tweet part to run our sentiment analysis.
> head(tweets.df$text)
[1] "We believe that every American should stand for the National Anthem, and we proudly pledge
allegiance to one NATION… https://t.co/4GQmdSmiRk"
[2] "This is your land, this is your home, and it's your voice that matters the most. So speak up, be
heard, and fight,… https://t.co/u09Brwnow3"
[3] "Just arrived at the Pensacola Bay Center. Join me LIVE on @FoxNews in 10 minutes! #MAGA
https://t.co/RQFqOkcpNV"
[4] "On my way to Pensacola, Florida. See everyone soon! #MAGA https://t.co/ijwxVSYQ52"
[5] "“The unemployment rate remains at a 17-year low of 4.1%. The unemployment rate in
manufacturing dropped to 2.6%, th… https://t.co/ujuFLRG8lc"
[1] "We believe that every American should stand for the National Anthem,
[2] "This is your land, this is your home, and it's your voice that matters the most.
[3] "Just arrived at the Pensacola Bay Center. Join me LIVE on "
• Some of these tweets are retweets from other users; for the given
application, we would not like to consider retweets (RTs) in sentiment
analysis
• We clean all these data by using the following code block:
• MeruTweets <- sapply(Meru_tweets, function(x) x$getText())
• OlaTweets = sapply(Ola_tweets, function(x) x$getText())
• TaxiForSureTweets = sapply(TaxiForSure_tweets, function(x) x$getText())
• UberTweets = sapply(Uber_tweets, function(x) x$getText())
Estimating sentiment (A)
• After getting the cleaned Twitter data, we are going to use few such R
packages to assess the sentiments in the tweets.
• A few tweets can be just information/facts, while others can be customer care
responses.
• Example of sentiments
• Positive: Love, best, cool, great, good, and amazing