Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Data Mining and Warehousing Project Report on

Twitter Sentiment Analysis using the R language

Done By​,

Saurabh Zingade
BEB1759

DEPARTMENT OF
COMPUTER ENGINEERING

JSPM’S
IMPERIAL COLLEGE OF ENGINEERING AND RESEARCH
Wagholi, Pune 412207
Index

Abstract 3

Introduction 3

Important Terminologies 4

Libraries used 5

Requirement Specification 6

Twitter Developer Account: 7

Implementation 12

Conclusion 17
1 Abstract

Sentiment analysis over Twitter offers organisations a fast and effective way to monitor the
publics’ feelings towards their brand, business, directors, etc. A wide range of features and
methods for training sentiment classifiers for Twitter datasets have been researched in recent
years with varying results. In this report, I have implemented the Twitter Sentiment analysis using
R language and some packages. The name of the packages are syuzhet, twitterR, tm, etc.
These packages are used to produce the sentiment behind the tweets that are fetched from twitter
using the twitter API.

2 Introduction
The emergence of social media has given web users a venue for expressing and sharing
their thoughts and opinions on all kinds of topics and events. Twitter, with nearly 600 million
users​ ​and over 250 million messages per day,​ ​has quickly become a gold mine for organisations
to monitor their reputation and brands by extracting and analysing the sentiment of the Tweets
posted by the public about them, their markets, and competitors. Sentiment analysis over Twitter
data and other similar microblogs face several new challenges due to the typical short length and
irregular structure of such content. Two main research directions can be identified in the
literature of sentiment analysis on microblogs. The first direction is concerned with finding new
methods to run such analysis, such as performing sentiment label propagation on Twitter
follower graphs and employing social relations for user-level sentiment analysis. The second
direction is focused on identifying new sets of features to add to the trained model for sentiment
identification, such as microblogging features including hashtags, emoticons the presence of
intensifiers such as all-caps and character repetitions etc., and sentiment-
topic features.
3 Important Terminologies

3.1 What is the Sentiment Analysis?

Sentiment essentially relates to feelings; attitudes, emotions and opinions. Sentiment


Analysis refers to the practice of applying Natural Language Processing and Text Analysis
techniques to identify and extract subjective information from a piece of text. A person’s opinion
or feelings are for the most part subjective and not facts. Which means to accurately analyze an
individual’s opinion or mood from a piece of text can be extremely difficult. With Sentiment
Analysis from a text analytics point of view, we are essentially looking to get an understanding
of the attitude of a writer with respect to a topic in a piece of text and its polarity; whether it’s
positive, negative or neutral.

3.2 What are Stop Words?

When working with text mining applications, we often hear of the term “stop words” or “stop word
list” or even “stop list”. Stop words are basically a set of commonly used words in any language, not just
English. The reason why stop words are critical to many applications is that, if we remove the words that
are very commonly used in a given language, we can focus on the important words instead.

Stop words are generally thought to be a ​“single set of words”​. It really can mean different things to
different applications. For example, in some applications removing all stop words right from determiners
(e.g. the, a, an) to prepositions (e.g. above, across, before) to some adjectives (e.g. good, nice) can be an
appropriate stop word list. To some applications, however, this can be detrimental. For instance, in
sentiment analysis removing adjective terms such as ‘good’ and ‘nice’ as well as negations such as ‘not’
can throw algorithms off their tracks. In such cases, one can choose to use a minimal stop list consisting
of just determiners or determiners with prepositions or just coordinating conjunctions depending on the
needs of the application.

3.3 What are Word Clouds?

Word clouds (also known as text clouds or tag clouds) work in a simple way: the more a specific
word appears in a source of textual data (such as a speech, blog post, or database), the bigger and bolder it
appears in the word cloud.

3.4 What is Text Mining?

Text mining also referred to as text data mining, roughly equivalent to text analytics, is the process
of deriving high-quality information from text. High-quality information is typically derived through the
devising of patterns and trends through means such as statistical pattern learning. Text mining usually
involves the process of structuring the input text (usually parsing, along with the addition of some derived
linguistic features and the removal of others, and subsequent insertion into a database), deriving patterns
within the structured data, and finally evaluation and interpretation of the output. 'High quality' in text
mining usually refers to some combination of relevance, novelty, and interest. Typical text mining tasks
include text categorization, text clustering, concept/entity extraction, production of granular taxonomies,
sentiment analysis, document summarization, and entity relation modelling (i.e., learning relations
between named entities).

4 Libraries used

4.1 twitteR
twitteR is an R package which provides access to the Twitter API. Most functionality of
the API is supported, with a bias towards API calls that are more useful in data analysis as opposed to
daily interaction.

4.2 tm
A framework for text mining applications within R.

4.3 Syuzhet
This vignette demonstrates the use of the basic functions of the Syuzhet package.
The package comes with four sentiment dictionaries and provides a method for accessing the
robust, but computationally expensive, sentiment extraction tool developed in the NLP group at
Stanford. Use of this later method requires that you have already installed the coreNLP package

The goal of this vignette is to introduce the main functions in the package so that you can quickly
extract plot and sentiment data from your own text files. This document will use a short example
passage to demonstrate the functions and the various ways that the extracted data can be returned
and or visualized.

4.4 Wordcloud
Functionality to create pretty word clouds, visualize differences and similarity
between documents, and avoid over-plotting in scatter plots with text.
5 Requirement Specification

5.1 Hardware Requirement

(a) Hard Disk: 80 GB ( Minimum )

(b) 4GB ( Minimum )

5.2 Software Requirement

(a) Operating System:

• Windows / Linux / macOS

5.3 Development Tools

(a) R Programming Language

(b) RStudio

(c) Libraries that are mentioned above.

(d) A Personal Computer with Minimum Configuration will do,


for better performance of the program, the configuration can
be enhanced.

Note: Twitter Developers Account is also required for performing this analysis.
6 Twitter Developer Account:

Twitter now manually approves all developer access request to API Keys.

Given the highly political nature of our global society and the high number of spammers working
our economy, who can blame them? In a world where botnets can be created overnight, social media
corporations are discovering they have to be more careful in how they allow their platforms to be
automated.

Manual applications, of course, slow things down. They also can make or break a person’s
ambitions. Students may not be able to begin (or complete) projects on time. SAAS (Software as a
service) companies may not be able to move forward with their commercial projects. Individuals might
not be able to create their novelty bots. With the judge and jury sitting on the other side, apprehension can
set in.

The Twitter developer portal is a set of self-serve tools that developers can use to manage their access to
the premium APIs, as well as to create and manage their Twitter apps.
The portal is made up of the following pages:
● A developer dashboard that displays Premium API usage and subscription level.
● A subscriptions page where you can manage and view additional details about your Premium
subscription level.
● An apps page where you can create and manage your Twitter Apps.
● An environments page where you can set up your developer environments.
● A billing page where you can view your payment details and previous invoices.
● and a teams page where you can add and manage the different handles that have access to your
team's Premium APIs.
6.1 Steps for creating a twitter developer account

1. Visit ​https://developer.twitter.com

2. Click on Apply and choose the reason for using developer account tools.
3. Give some personal details.

4. Give the details of twitter how you are planning to use the twitter data fetched from API
6.2 Creation of App for getting API keys and tokens

1. Navigate to My Applications.

2. Since I already have this app created, it appears on my page. Click on “Create New
App”.
3. Fill in all the details in the application.

4. Once all the details are filled in and verified you will be granted the customer and access
keys.
7 Implementation

7.1 Adding Libraries

install.packages(​"twitteR"​)
install.packages(​"RCurl"​)
install.packages(​"base64enc"​)
install.packages(​"httr"​)
install.packages(​"tm"​)
install.packages(​"wordcloud"​)
library​(twitteR)
library​(RCurl)
library​(base64enc)
library​(httr)
library​(tm)
library​(wordcloud)

7.2 Add the Consumer keys to the various variables

consumer_key <-​"ABCDEFGHIJKLMNOPQRSTUVWXYZ123467890"
consumer_secret <- ​"ABCDEFGHIJKLMNOPQRSTUVWXYZ123467890"
access_token <-​"ABCDEFGHIJKLMNOPQRSTUVWXYZ123467890"
access_secret <- ​"ABCDEFGHIJKLMNOPQRSTUVWXYZ123467890"

7.3 Setting up the connection with the twitter API

twitteR:::setup_twitter_oauth(consumer_key, consumer_secret, access_token,


access_secret)
7.4 Getting tweets related to one particular topic in this case Cristiano
7.5 Getting tweets Removing all the stop words, punctuation marks, etc

7.6 Using get_nrc_sentiment() and getting the sentiments from the words of the tweets
7.7 Only Fetching Positive, Negative and Neutral Tweets

7.8 After Graph Plotting:


8 Conclusion

Text Processing and Sentiment analysis emerges as a challenging field with lots of
obstacles as it involves natural language processing. It has a wide variety of applications that could
benefit from its results, such as news analytics, marketing, question answering, readers do. Getting
important insights from opinions expressed on the internet especially from social media blogs is vital for
many companies and institutions, whether it is in terms of product feedback, public mood, or investors
opinions.

Sentiment analysis is a difficult technology to get right. However, when you do, the benefits are great.

Look for a tool that has uses Natural Language Processing technology and ideally with machine learning
capabilities. Look for a vendor that treats sentiment analysis seriously and shows advancements and
updates in their sentiment analysis technology.

You might also like