Professional Documents
Culture Documents
DAC Biz&Mkt LS Sept '16 - Session 1 - Handout
DAC Biz&Mkt LS Sept '16 - Session 1 - Handout
DAC Biz&Mkt LS Sept '16 - Session 1 - Handout
Sentiment Analysis
According to Wikipedia, sentiment analysis (a.k.a opinion mining) refers to the use
of natural language processing, text analysis and computational linguistics to
identify and extract subjective information from source materials. One of the most
common applications of sentiment analysis is to track attitudes and feelings from a
text source or the web, especially for tracking products, services, brands or even
people. We then determine whether they are viewed positively or negatively by an
audience.
2. Unstructured Data
Before we move on to the techniques used for text mining and sentiment analysis,
we shall learn about unstructured data and the importance of it.
When we do text mining, we have a myriad of resources to choose from: email, PDF
files, blogs and even social media sites. For this session, we will be using Facebook
as the source for our text data. Before we jump straight into our R programming, we
first must create an application through Facebooks Developer site to use its
Application Programming Interface (API). An API is essentially a way for
programmers to communicate with a certain application and in this case, you will be
using the API to get data from a Facebook page. Your API is the middleman between
the R environment and Facebook.
To create a Facebook application, first log in into your Facebook account. Go to the
following link: https://developers.facebook.com/docs/apps/register. Scroll down to
(2. Developer Account) and click on the button Create Developer Account to
Register as a Facebook Developer.
1 https://breakthroughanalysis.com/2008/08/01/unstructured-data-and-the-80-percent-rule/
2 http://www.crn.com/news/storage/240159690/idc-file-based-object-based-storage-growth-
far-exceeds-overall-storage-growth.htm
2
After registering, click on the Create new Facebook App button and click on the
selection Website as a platform.
Give a name for the app and click on Create New Facebook App ID.
3
In the next window choose a category for your app (I picked Apps for Pages, but
any one would work) and then Create App ID. Complete the verification step
where you have to select pictures as prompted. At the page, you will end up at, click
on the Skip Quick Start button to get directly to the settings of your app.
Welcome to your very first own Facebook app! Now, switch to R Studio and install
the required libraries/packages to call upon and utilise the Facebook web API.
> install.packages("devtools")
> library(devtools)
4
> install_github("Rfacebook", "pablobarbera", subdir="Rfacebook")
> library(Rfacebook)
> install.packages("httr")
> library(httr)
Package descriptions:
After installing the packages, we need to connect our R session with the Facebook
app that we have just created, and authenticate it for data mining. In other words, R
will be connected to your app and when it is conducting text mining, it is able to
access whatever information that is available on a page/profile. If a certain
information is made accessible to your app, R is able to access it as well.
Copy your App ID and App Secret as arguments to the parameters needed in
the fbOAuth() function.
Copy the URL http://localhost:1410/ and go to the settings of your Facebook app
(Settings is accessible through the tab on the left side). Then, choose on + Add
Platform and choose Website.
5
Copy the URL for the field as shown below and save the changes.
Go back to R Studio and hit the Enter key. A browser window would then open and
you have to allow the app to access your Facebook account.
6
If everything worked, the browser should show the message:
Authentication complete.
Authentication successful.
> load("fb_oauth")
Now that we have connected everything and gain access to Facebook, we test some
of the functions. We will start with getting our own profile information. Before we
can access such information to derive insights, we have to obtain an Access
token. How we get this access token is to first head to Tools & Support.
7
Select Graph API Explorer.
Click Get Token, Get User Access Token, check all boxes, Get Access
Token and finally click OK.
8
Copy the newly generated Access token.
Note: The difference between using the access token generated by the app (app
access token) and the temp access token (user access token) is that the latter
communicates with the API by accessing through your own Facebook profile, instead
of the app. Some user data that would normally be visible to an app that's making a
9
request with a user access token isn't always visible with an app access token. If
you're reading user data and using it in your app, you should use a user access
token instead of an app access token. In addition, using user access tokens is more
secured in a way that you are required to refresh it every few hours.
Following the steps above, we can now move on to mining data from Facebook since
we have established a connection to its API. However, before we get into that, we
should briefly look into what an API is.
What is an API?
Application Programming Interface, or API, is essentially a way for programmers or
developers to communicate with a certain application. So in this case where you
would like to mine data from Facebook, you have to call its API. To do that you have
to communicate with it using specific languages, like a programming language in
our case it is R. The API serves as the middleman between the
programmer/developer and an application. This middleman gets the requests and
should it be a valid one, it returns the requested data.
Do note that from here on, we will be using User Access Token to access the API.
We will start off with the simple task of mining details of your own Facebook profile.
After executing the above, you will obtain a data frame with several variables
containing information regarding your Facebook profile. Lets try to access one of
these variables.
> my_fb_profile$name
[1] "Ryzal Kamis"
You can also get a list of pages that you have liked.
From the data frame that you have created above, choose a Facebook page to mine
data from. Copy its unique ID from the variable id and copy it to the command to
be executed below. I will use Analytics Vidhya (a great resource for topics
regarding data analytics) Facebook page as an example here.
10
For the parameter n, an argument of a huge number is stated because by default,
the function would only get 25 posts. Now, we should utilise the xlsx package to
write the data frame we have created above to an Excel file.
> library(xlsx)
> write.xlsx(analytics.vidhya_fb.posts,
+ "Analytics Vidhya Facebook Posts.xlsx",
+ showNA = TRUE)
Now, you can view contents shared by the page through the Excel file you have just
created. Now we will search for Facebook pages that contains certain key words you
will be specifying.
To identify the page that has the most number of likes or talking_about_count
> fb.pages_data$name[which.max(fb.pages_data$likes)]
[1] "TE Data"
> fb.pages_data$name[which.max(fb.pages_data$talking_about_count)]
[1] "MY DATA Online Store"
We can do the same for Facebook groups as well. Use the function searchGroup() to
create a data frame listing down Facebook groups with a certain key word.
To get the list of posts shared by this group, you can use the function getGroup().
However, do note that this function is only applicable to groups with an OPEN
privacy. We will be using the local group DataScience SG as an example here.
> write.xlsx(data.science.sg_fb.posts,
+ "DataScience SG Facebook Group Posts.xlsx",
+ showNA = TRUE)
For cheap thrills, let us now try updating our Facebook status through R.
Awesome stuff isnt it? There are many uses to the functions that you have been
introduced with. The vast amount of information you can obtain with such
convenience speaks volume of Rs potential. This is just the tip of the iceberg.
For the next session of the Lab Series we will be learning how to construct Word
Clouds. We will use data from the SIM Confessions page. So first, we have to mine
data from the page.
Mine all the posts created by the page since the date of its creation and export it to
an Excel file in .xlsx format.
We can use this Excel file as a data set for us to rely on for the next session.
Otherwise, the function save.image() works in retaining the data frame you have
created in the environment.
12
END OF SESSION
13