Professional Documents
Culture Documents
Ashwini Fake Report
Ashwini Fake Report
CHAPTER 1
INTRODUCTION
1.1 Introduction
News Analysis is a term that you must have heard if you have been in the Tech field
long enough. It is the process of predicting whether a piece of information (i.e. text,
most commonly) indicates a positive, negative or neutral news on the topic. In this
article, we will go through making a Python program that analyzes the news of tweets
on a particular topic. The user will be able to input a keyword and get the news on it
based on the latest 100 tweets that contain the input keyword. Also known as “Opinion
Mining”, News Analysis refers to the use of Natural Language Processing to determine
the attitude, opinions and emotions of a speaker, writer, or other subject within an
online mention. Essentially, it is the process of determining whether a piece of writing
is positive or negative. This is also called the Polarity of the content. As humans, we
are able to classify text into positive/negative subconsciously. For example, the
sentence “The kid had a gorgeous smile on his face”, will most likely give us a
positive news. In layman’s terms, we kind of arrive to such conclusion by examining
the words and averaging out the positives and the negatives. For instance, the words
“gorgeous” and “smile” are more likely to be positive, while words like “the”, “kid”
and “face” are really neutral. Therefore, the overall news of the sentence is likely to be
positive. A common use for this technology comes from its deployment in the social
media space to discover how people feel about certain topics, particularly through
users’ word-of-mouth in textual posts, or in the context of Twitter, their tweets
1.2 Prerequisites
Basic programming knowledge
Although Python is highly involved in this mini-project, it is not required to have a
deep knowledge in the language, as long as you have basic programming knowledge.
Installed tools
For this program, we will need Python to be installed on the computer. We will be
using the libraries twitter, nltk, re, csv, time, and json. You are likely to have to install
the first two libraries. The rest already come with the Python interpreter. It doesn’t
hurt to check that they’re up-to-date though.
Now that we have discussed some of the text formatting techniques employed by us,
we will move to the list of features that we have explored. As we will see below a
feature is any variable which can help our classifier in differentiating between the
different classes. There are two kinds of classification in our system (as will be
discussed in detail in the next section), the objectivity / subjectivity classification and
the positivity / negativity classification. As the name suggests the former is for
SEC-ME CSIT Page no. 3
NLP approach for recognising the tweets manipulation and detecting by increasing the accuracy using Machine learning
differentiating between objective and subjective classes while the latter is for
differentiating between positive and negative classes.
The list of features explored for objective / subjective classification is as
below:
Number of exclamation marks in a tweet
Number of question marks in a tweet
Presence of exclamation marks in a tweet
Presence of question marks in a tweet
Presence of url in a tweet
Presence of emoticons in a tweet
Unigram word models calculated using Naive Bayes
Prior polarity of words through online lexicon MPQA
Number of digits in a tweet
Number of capitalized words in a tweet
Number of capitalized characters in a tweet
Number of punctuation marks / symbols in a tweet
The list of features explored for positive / negative classification are given below:
Overall emoticon score (where 1 is added to the score in case of positive
emoticon, and 1 is subtracted in case of negative emoticon)
Ratio of non-dictionary words to the total number of words in the tweet
Length of the tweet
Number of adjectives in a tweet
Number of comparative adjectives in a tweet
Number of superlative adjectives in a tweet
Number of base-form verbs in a tweet
Number of past tense verbs in a tweet
Number of present participle verbs in a tweet
Number of past participle verbs in a tweet
Number of 3rdperson singular present verbs in a tweet
Number of non-3rdperson singular present verbs in a tweet
Number of adverbs in a tweet
Number of personal pronouns in a tweet
Number of possessive pronouns in a tweet
Number of singular proper noun in a tweet
SEC-ME CSIT Page no. 4
NLP approach for recognising the tweets manipulation and detecting by increasing the accuracy using Machine learning
CHAPTER 2
LITERATURE SURVEY
for defining their training data (Pak and Paroubek 2010; Bifet and Frank 2010).
(Barbosa and Feng 2010) exploit existing Twitter news sites for collecting
training data. (Davidov, Tsur, and Rappoport 2010) also use hashtags for
creating training data, but they limit their experiments to news/non-news
classification, rather than 3-way polarity classification, as we do. We use
WEKA and apply the following Machine Learning algorithms for this second
classification to arrive at the best result:
• K-Means Clustering
• Support Vector Machine
• Logistic Regression
• K NearestNeighbours
• Naive Bayes
• Rule Based Classifiers
2.2.1 Tokenization
The first task, that must be completed before any processing can occur, is to
divide the textual data into smaller components. This is a common step in a
Natural Language Processing (NLP) application, known as tokenization. At a
SEC-ME CSIT Page no. 8
NLP approach for recognising the tweets manipulation and detecting by increasing the accuracy using Machine learning
higher level, the text is initially divided into paragraphs and sentences. As a
consequence of the length limitation of 140 characters imposed by twitter, it is
rarely the case that a tweet will contain more than a paragraph. In these regards,
the project aim at this step is to correctly identify sentences. This can be done
by interpreting the punctuation marks such as a period mark “.”, within the text
analysed. The next step is to extract the words (tokens) from sentences. The
challenge at this step is to handle the orthography within a sentence.
Consequently, spelling errors have to be corrected, URLs and punctuation shall
be excluded from the resulting set of tokens. As it can be observed in Figure
2.1, after tokenizing a tweet the returned result is is an array containing a set of
strings.
Rules Examples
2.2.4 N-grams
N-grams is a common technique in text mining, where word subsets of length n
within a sentence are formed. From the sentence “This is a six words
sentence!” the following n-grams can be formed:
1- grams (unigrams): “this”, “is”, “a”, “six”, “words”, “sentence”
2- grams (bigrams): “this is”, “is a”, “a six”, “six words”, “words sentence”
3- grams (trigrams): “this is a”, “is a six”, “a six words”, “six words sentence”
As such, the example sentence above will produce 6 unigrams, 5 bigrams, and
4 trigrams. On a bigger data set, producing bigrams and trigrams will
considerably contribute to the size of the data set, consequently, slowing down
the system. An approach to solve this challenge is detailed in Chapter
which contain dogs. When presented with new unlabelled pictures, the
algorithm will decide based on what it ‘saw’ before, the type of animal in the
picture. Hence, the goal is to ‘learn’ a general rule that maps input to output [6].
and manipulation. It’s a vast and vibrant field with a long history! New
research and techniques are being developed constantly.The aim is to introduce
a few simple concepts and techniques from NLP—just the stuff that’ll help you
do creative things quickly, and maybe open the door for you to understand
more sophisticated NLP concepts that you might encounter elsewhere.
The most commonly known library for doing NLP in Python is NLTK. NLTK
is a fantastic library, but it’s also a writhing behemoth: large and slippery and
difficult to understand. TextBlob is a simpler, more humane interface to much
of NLTK’s functionality: perfect for NLP beginners or poets that just want to
get work done.
Natural language
“Natural language” is a loaded phrase: what makes one stretch of language
“natural” while another stretch is not? NLP techniques are opinionated about
what language is and how it works; as a consequence, you’ll sometimes find
yourself having to conceptualize your text with uncomfortable abstractions in
order to make it work with NLP. (This is especially true of poetry, which
almost by definition breaks most “conventional” definitions of how language
behaves and how it’s structured.) Of course, a computer can never really fully
“understand” human language.
Even when the text you’re using fits the abstractions of NLP perfectly, the
results of NLP analysis are always going to be at least a little bit inaccurate.
The main assumption that most NLP libraries and techniques make is that the
text you want to process will be in English. Historically, most NLP research
has been on English specifically; it’s only more recently that serious work has
gone into applying these techniques to other languages. The examples in this
chapter are all based on English texts, and the tools we’ll use are geared toward
English. If you’re interested in working on NLP in other languages, here are a
few starting points:
Konlpy, natural language processing in Python for Korean
Jieba, text segmentation and POS tagging in Python for Chinese
The Pattern library (like TextBlob, a simplified/augmented interface to NLTK)
includes POS-tagging and some morphology for Spanish in
its pattern.es package.
SEC-ME CSIT Page no. 13
NLP approach for recognising the tweets manipulation and detecting by increasing the accuracy using Machine learning
Morphology
“Morphology” is the word that linguists use to describe all of the weird ways
that individual words get modified to change their meaning, usually with
prefixes and suffixes. e.g.
do -> redo
sing -> singing
monarch -> monarchy
teach -> taught
A word’s “lemma” is its most “basic” form, the form without any morphology
applied to it. “Sing,” “sang,” “singing,” are all different “forms” of the
lemma sing. Likewise, “octopi” is the plural of “octopus”; the “lemma” of
“octopi” is octopus.“Lemmatizing” a text is the process of going through the
text and replacing each word with its lemma. This is often done in an attempt to
reduce a text to its most “essential” meaning, by eliminating pesky things like
verb tense and noun number.
Pluralization
There’s one super important kind of morphology, and that’s the rules for taking
a singular noun and turning it into a plural noun, like:
cat -> cats (easy)
cheese -> cheeses (also easy...)
knife -> knives (complicated)
child -> children (huh?)
Even though we as humans employ plural morphology in pretty much every
sentence, without a second thought, the rules for how to do it are actually really
weird and complicated (as with all elements of human language). A computer
is never going to be able to 100% accurately make this transformation. But,
again, some NLP libraries try.
Using TextBlob
At this point, let’s actually start using TextBlob so we can put this boring
theory into practice.
Installation. To use textblob, we of course need to install it! And to install it, we
need to create or activate a virtualenv.
$ virtualenvvenv
$ sourcevenv/bin/activate
Then, install TextBlob with pip:
$ pip install textblob
Wait as things happen. When it’s all done, you might need to download the
TextBlob corpora; to do so, type the following on the command line:
$ python -m textblob.download_corpora
Sentences, words and noun phrases
Let’s play with TextBlob in the interactive interpreter. The basic steps for using
SEC-ME CSIT Page no. 16
NLP approach for recognising the tweets manipulation and detecting by increasing the accuracy using Machine learning
TextBlob are:
Create a TextBlob object, passing a string with the text we want to work with.
Use various methods and attributes of the resulting object to get at various parts
of the text.
For example, here’s how to get a list of sentences from a string:
>>>fromtextblob import TextBlob
>>>blob = TextBlob("ITP is a two-year graduate program located in the
Tisch School of the Arts. Perhaps the best way to describe us is as a Center for
the Recently Possible.")
>>>for sentence in blob.sentences:
... print sentence
ITP is a two-year graduate program located in the Tisch School of the Arts.
Perhaps the best way to describe us is as a Center for the Recently Possible.
The .sentences attribute of the resulting object is a list of sentences in the text.
(Much easier than trying to split on punctuation, right?) Each sentence object
also has an attribute .words that has a list of words in that sentence.
>>>fromtextblob import TextBlob
>>>blob = TextBlob("ITP is a two-year graduate program located in the
Tisch School of the Arts. Perhaps the best way to describe us is as a Center for
the Recently Possible.")
>>>for word in blob.sentences[1].words:
... print word
The TextBlob object also has a “.noun_phrases” attribute that simply returns
the text of all noun phrases found in the original text:
>>>fromtextblob import TextBlob
>>>blob = TextBlob("ITP is a two-year graduate program located in the
Tisch School of the Arts. Perhaps the best way to describe us is as a Center for
the Recently Possible.")
>>>fornp in blob.noun_phrases:
... printnp
itp
two-year graduate program
tisch
recently
SEC-ME CSIT Page no. 17
NLP approach for recognising the tweets manipulation and detecting by increasing the accuracy using Machine learning
(As you can see, this isn’t terribly accurate, but we’re working with computers
here. What are you going to do.)
“Tagging” parts of speech
TextBlob can also tell us what part of speech each word in a text corresponds
to. It can tell us if a word in a sentence is functioning as a noun, an adjective, a
verb, etc. In NLP, associating a word with a part of speech is called “tagging.”
Correspondingly, the attribute of the TextBlob object we’ll use to access this
information is .tags.
>>>fromtextblob import TextBlob
>>>blob = TextBlob("I have a lovely bunch of coconuts.")
>>>for word, pos in blob.tags:
... print word, pos
This for loop is a little weird, because it has two temporary loop variables
instead of one. (The underlying reason for this is that .tags evaluates to a list of
two-item tuples, which we can automatically unpack by specifying two items in
the for loop. Don’t worry about this if it doesn’t make sense. Just know that
when we’re using the .tags attribute, you need two loop variables instead of
one.) The first variable, which we’ve called word here, contains the word; the
second variable, called pos here, contains the part of speech.
When no such line can be established, a kernel function is used to map the data
into a higher dimensional plane, where the data can be separated by its
component classes. When visualised in a lower dimensional plane, the line
separating the classes is curved.
CHAPTER 3
SYSTEM DEVELOPMENT
Furthermore, advanced libraries for data processing such as NumPy and SciPy
were developed by the scientific community and domain experts [29]. Such
tools proved to be helpful during the development stage, and reinforced the
reasoning of choosing Python.
Together, the pipeline and the aforementioned components form the engine
of the system. One of the early objectives of the project was developing a
working model of the engine. As a consequence, the user interface was
produced in later development stages. The architectural overview is
illustrated in Figure 3.1. The following subsections will cover in detail the
design and the requirements gathering for each component of the system.
more than one user at a time, and also to manage multiple requests from the
same user without having to run multiple instances of the engine [14].
Another important design problem was data storage. Here, two database architectures
were considered: a lightweight database - Redis - using a file system stored in RAM,
and a relational database - PostgreSQL.Furthermore, the engine has to support both
the real time component, and the long-term news analysis component. Consequently,
Redis was the first choice, as it proved to be an efficient solution for both temporary
and permanent storage, being particularly useful at caching operations inthe long-
term component.A more detailed list of non-functional requirements is outlined in
SEC-ME CSIT Page no. 24
NLP approach for recognising the tweets manipulation and detecting by increasing the accuracy using Machine learning
Table 3.2.
Table 3.2: Engine - non-functional requirements
presents the non-functional requirements developed for the graphical user interface.
Table 3.4: GUI - non-functional requirements
No. Non-functional requirements Type
1 The graphical user interface should have a quick Responsiveness
response time for user interactions
2 The graphical user interface should present an intuitive design. Readability
3 The graphical user interface should output the text Usability
body of the tweets analysed
An important challenge was to make the interface robust, so that a user can follow the
news evolution of two or more topics at the same time. This was accomplished using
Flask - a third party micro framework in Python, which offers a build-in development
server with restful request dispatching [20].
CHAPTER 4
IMPLEMENTATION AND RESULT
4.1 Engine
The engine was built using “the pipeline” (introduced in Chapter 3) and it
had to adapt for two purposes. One is to serve the real time interface
(described in future subsections). The second purpose is to be run
continuously in a pre-established interval of time in order to perform in
depth news analysis for a spectrum of brands and topics. This is reflected in
the separation in two modules of the data gathering component.
In the testing phase, unseen labelled data and the probabilistic model produced in the
training phase were used to measure the accuracy of the algorithm. Consequently, the
algorithm produced a classification accuracy of 71%. For the development stage
when it was implemented, such accuracy was expected, yet further improvements
were required.As the data set used in training had labels only for positive and
negative tweets, yet the goal is to distinguish between the three: positive, negative,
neutral, further heuristic was required. In addition, a news evaluation scale: -1
(negative) to 1 (positive) was introduced, where tweets of which news score was in
the range of: [-0.05, 0.05] were considered neutral. The limits of the range were
established after a manual inspection of the tweets analysed.The following subsection
describes the implementation of heuristics to improve the performance and the
classification accuracy of the algorithm.
While Porter’s algorithm reduced the corpus size to 60% of its initial size after
tokenization, this was not sufficient, as a significant number of the tokens left, held
no particular news value. Verifying manually which tokens were prevalent in positive
or negative contexts was not a solution, due to the high amount of manual work. An
interesting new heuristics using the pareto principle (explained in Chapter 2) was
applied. Looking at the tokens distribution of the training model - Figure 4.3, it can be
noticed that words such as “I” and “the” have the highest occurrence throughout the
whole input data. At the opposite end, are words which albeit, they might present
some sort of news value, the number of total occurrences in the corpus is low,
consequently the impact over the module is almost irrelevant.However, when
separated from their context, those words do not hold any news value. By contrast,
words situated on the curve’s slope in the figure, appeared to have a strong news
value (e.g.: “love”, “hate”, “amazing” etc.). Following the pareto principle, the corpus
was cut to 20% of its size, so that only the tokens with a high news value were kept.
A manual inspection was required to determine where tokens with low news value
start to appear in the distribution, in order to determine the cut points. As a result, the
final size of the model was reduced down to 40.000 tokens. Consequently, the
accuracy was improved by 7% as a result of noise (unwanted data) removal
feature x, a new feature (token) NOT x is created” [4]. For example after performing
negation handling the sentence: “I do not like this new brand” will result in the
following representation: “I do not NOT_like, NOT_this, NOT_new, NOT_brand”.
The advantage of this feature is that the plain occurrence and the negated occurrence
of the word are now distinguishable in the corpus. In addition, the accuracy of the
classifier was improved by 1.8% up to 82.71% overall accuracy (more details are
presented in Chapter 5).
CODING
from textblob import TextBlob
import matplotlib.pyplot as plt
class SentimentAnalysis:
def __init__(self):
self.tweets = []
self.tweetText = []
def DownloadData(self):
# authenticating
consumerKey = 'dRUmSEf6tflbDEbpWGjgCKOGc'
consumerSecret = 'nQKlLtDBUQvr92nAcIz8hZAwKK5FRFDudR8uI4O
Js2qb9qQeiR'
accessToken = '481852749-
QP10nINCYjg4w4DEoYGYfGDeZr7aSxyLbc3joORT'
accessTokenSecret = 'l3ctmoHnhl59niXwARWYft0CLx6XHZzvCt72
igG6njCn3'
auth = tweepy.OAuthHandler(consumerKey, consumerSecret)
auth.set_access_token(accessToken, accessTokenSecret)
api = tweepy.API(auth)
searchTerm = input("Enter Keyword/Tag to search about: ")
NoOfTerms = int(input("Enter how many tweets to search: "
))
# searching for tweets
self.tweets = tweepy.Cursor(api.search, q=searchTerm, lan
g = "en").items(NoOfTerms)
# Open/create a file to append data to
csvFile = open('result.csv', 'a')
csvWriter = csv.writer(csvFile)
polarity = 0
positive = 0
wpositive = 0
spositive = 0
negative = 0
wnegative = 0
snegative = 0
neutral = 0
# iterating through tweets fetched
for tweet in self.tweets:
#Append to temp so that we can store in csv later. I
use encode UTF-8
self.tweetText.append(self.cleanTweet(tweet.text).enc
ode('utf-8'))
# print (tweet.text.translate(non_bmp_map)) #print
tweet's text
analysis = TextBlob(tweet.text)
# print(analysis.sentiment) # print tweet's polarity
polarity += analysis.sentiment.polarity # adding up
polarities to find the average later
if (analysis.sentiment.polarity == 0): # adding reac
tion of how people are reacting to find average later
neutral += 1
elif (analysis.sentiment.polarity > 0 and analysis.se
ntiment.polarity <= 0.3):
wpositive += 1
elif (analysis.sentiment.polarity > 0.3 and analysis.
sentiment.polarity <= 0.6):
positive += 1
elif (analysis.sentiment.polarity > 0.6 and analysis.
sentiment.polarity <= 1):
SEC-ME CSIT Page no. 35
NLP approach for recognising the tweets manipulation and detecting by increasing the accuracy using Machine learning
spositive += 1
elif (analysis.sentiment.polarity > -0.3 and analysis
.sentiment.polarity <= 0):
wnegative += 1
elif (analysis.sentiment.polarity > -0.6 and analysis
.sentiment.polarity <= -0.3):
negative += 1
elif (analysis.sentiment.polarity > -1 and analysis.s
entiment.polarity <= -0.6):
snegative += 1
csvWriter.writerow(self.tweetText)
csvFile.close()
# finding average of how people are reacting
positive = self.percentage(positive, NoOfTerms)
wpositive = self.percentage(wpositive, NoOfTerms)
spositive = self.percentage(spositive, NoOfTerms)
negative = self.percentage(negative, NoOfTerms)
wnegative = self.percentage(wnegative, NoOfTerms)
snegative = self.percentage(snegative, NoOfTerms)
neutral = self.percentage(neutral, NoOfTerms)
polarity = polarity / NoOfTerms
print("How people are reacting on " + searchTerm + " by a
nalyzing " + str(NoOfTerms) + " tweets.")
print()
print("General Report: ")
if (polarity == 0):
print("Neutral")
elif (polarity > 0 and polarity <= 0.3):
print("Weakly Positive")
elif (polarity > 0.3 and polarity <= 0.6):
print("Positive")
elif (polarity > 0.6 and polarity <= 1):
print("Strongly Positive")
elif (polarity > -0.3 and polarity <= 0):
print("Weakly Negative")
elif (polarity > -0.6 and polarity <= -0.3):
print("Negative")
elif (polarity > -1 and polarity <= -0.6):
print("Strongly Negative")
print()
print("Detailed Report: ")
print(str(positive) + "% people thought it was positive")
print(str(wpositive) + "% people thought it was weakly po
sitive")
print(str(spositive) + "% people thought it was strongly
positive")
print(str(negative) + "% people thought it was negative")
print(str(wnegative) + "% people thought it was weakly ne
gative")
print(str(snegative) + "% people thought it was strongly
negative")
print(str(neutral) + "% people thought it was neutral")
self.plotPieChart(positive, wpositive, spositive, negativ
e, wnegative, snegative, neutral, searchTerm, NoOfTerms)
def cleanTweet(self, tweet):
return ' '.join(re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])
| (\w +:\ / \ / \S +)", " ", tweet).split())
# function to calculate percentage
def percentage(self, part, whole):
temp = 100 * float(part) / float(whole)
return format(temp, '.2f')
def plotPieChart(self, positive, wpositive, spositive, negati
ve, wnegative, snegative, neutral, searchTerm, noOfSearchTerms):
labels = ['Positive [' + str(positive) + '%]', 'Weakly Po
sitive [' + str(wpositive) + '%]','Strongly Positive [' + str(spo
sitive) + '%]', 'Neutral [' + str(neutral) + '%]',
'Negative [' + str(negative) + '%]', 'Weakly Ne
gative [' + str(wnegative) + '%]', 'Strongly Negative [' + str(sn
egative) + '%]']
sizes = [positive, wpositive, spositive, neutral, negativ
e, wnegative, snegative]
colors = ['yellowgreen','lightgreen','darkgreen', 'gold',
'red','lightsalmon','darkred']
patches, texts = plt.pie(sizes, colors=colors, startangle
=90)
plt.legend(patches, labels, loc="best")
plt.title('How people are reacting on ' + searchTerm + '
by analyzing ' + str(noOfSearchTerms) + ' Tweets.')
plt.axis('equal')
plt.tight_layout()
plt.show()
if __name__== "__main__":
sa = SentimentAnalysis()
SEC-ME CSIT Page no. 37
NLP approach for recognising the tweets manipulation and detecting by increasing the accuracy using Machine learning
sa.DownloadData()
SNAPSHOTS
Fig.4.5 Retrieving the tweets and categoring the tweets according to polarity
as positive ,negative and neural
CHAPTER 5
TESTING
CHAPTER 6
DIFFERENCE BETWEEN PROPSED AND
EXISTING SYSTEM
CHAPTER 7
CONCLUSION AND FUTURE SCOPE
7.1 Conclusion
Considering the above reflection on the project’s planning, it can be concluded that
the scope of the project was met within the timeline set. As such, the desired
classification accuracy of 80% or more is reached. The reporting is performed in real
time and the graphical user interface offers an intuitive, yet comprehensive user
interaction. Overall, the project was a good opportunity to further hone my
programming skills and expand my knowledge in the fields of natural language
processing and machine learning. By designing and developing this project, my
overall knowledge and skills in computer science were significantly improved. As
such, a better understanding of the machine learning field was gained, while
implementing the classification algorithm and the subsequent heuristics to improve its
accuracy. In addition, fundamental knowledge in the field of natural language
processing was acquired.
Within the same note, the previously acquired Python skills were sharpened, by
mastering advanced concepts such as generators and decorators. From the perspective
of the application domain - online marketing industry - a number of lessons were
drawn. Deploying the software in industry, has shown that a news analysis tool is
fundamental for the success of a company’s marketing campaign, yet a more specific,
topic oriented, product is desirable. However, the deliverables would include similar
functionalities. From an implementation perspective, the benefit of developing the
classification algorithm from scratch was reflected by the full control over the
engine’s capabilities. The pipeline produced was helpful to keep an organized manner
of writing code and an useful debugging tool. However, a second iteration of the
project will employ a more in depth research of the algorithms available for
classification, deprioritizing the pipeline for later stages of the development.
modules to extract information from other social media platforms: Facebook, which
provides a comprehensive Python API, and YouTube.Additionally, a more complex
spectrum of news values will be introduced (e.g., joy, anger, hope, etc.)
REFERENCES
[1] SisiraNeti, S. (2011). SOCIAL MEDIA AND ITS ROLE IN MARKETING. 1st
ed.[ebook]International Journal of Enterprise Computing and Business Systems.
Available at: http://www.ijecbs.com/July2011/13.pdf [Accessed 13 Apr. 2016].
[2] Shatkay, H. and Craven, M. (2012). Mining the biomedical literature.
Cambridge, Mass.: MIT Press.Nlp.stanford.edu. (2016). Stemming and
lemmatization. [online] Available at:
http://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-
1.html [Accessed 17 Apr. 2016].
[3] Busino, G. (ed) (1965) Oeuvres complètes. Vilfredo Pareto 1848-1923. Geneva:
Droz.Russell, Stuart, Norvig, Peter (2003) [1995]. Artificial Intelligence: A
Modern Approach (2nd ed.). Prentice Hall. ISBN 978-0137903955.
[4] Konstantinova, N. (2014). Machine learning explained in simple words - Natalia
Konstantinova. [online] Nkonst.com. Available at:http://nkonst.com/machine-
learning-explained-simple-words/ [Accessed 18 Apr. 2016].
[5] Miskovic, V. (2001). Application of inductive machine learning in data mining.
Vojnotehnickiglasnik, 49(4-5), pp.429-438.Slideshare.net. (2016). Lucene/Solr
Revolution 2015: [online] Available at:
http://www.slideshare.net/joaquindelgado1/lucenesolr-revolution-2015-where-
search-meets- machine-learning [Accessed 16 Apr. 2016].
[6] Saedsayad.com. (2016). Naïve Bayesian. [online] Available at:
http://www.saedsayad.com/Naïve_bayesian.htm [Accessed 21 Apr. 2016].
[7] Docs.opencv.org. (2016). Introduction to Support Vector Machines — OpenCV
2.4.13.0documentation. [online] Available
at:http://docs.opencv.org/2.4/doc/tutorials/ml/introduction_to_svm/
introduction_to_svm.html [Accessed Apr. 2016].
[8] Codefellows.org. (2016). 5 Reasons why Python is Powerful Enough for Google.
[online] Available:https://www.codefellows.org/blog/5-reasons-why-python-is-