Professional Documents
Culture Documents
CU6051NA - Artificial Intelligence 20% Individual Coursework 2019-20 Autumn
CU6051NA - Artificial Intelligence 20% Individual Coursework 2019-20 Autumn
2019-20 Autumn
I confirm that I understand my coursework needs to be submitted online via Google Classroom under the relevant
module page before the deadline in order for my assignment to be accepted and marked. I am fully aware that late
submissions will be treated as non-submission and a mark of zero
Contents
1. Introduction..............................................................................................................................1
1.1. Explanation of the AI concept chosen..............................................................................2
1.1.1. Sentiment Analysis....................................................................................................2
1.2. Explanation/introduction of the chosen problem domain/topic........................................3
2. Background..............................................................................................................................4
2.1. Sentiment Analysis and its approaches.............................................................................4
2.1.1. Approaches................................................................................................................4
2.2. Research works done on Sentiment Analysis...................................................................6
2.3. Current applications of Sentiment analysis.......................................................................6
3. Solution....................................................................................................................................7
3.1. Explanation of the proposed solution/approach to solving the problem...........................7
3.2. Explanation of the AI algorithm.......................................................................................8
3.3. Pseudocode........................................................................................................................9
3.4. Flowchart........................................................................................................................10
4. Conclusion.............................................................................................................................11
4.1. Analysis of the work done..............................................................................................11
4.2. Solution addressing the real-world problems..................................................................11
4.3. Further work....................................................................................................................11
5. Bibliography..........................................................................................................................12
Table of Figures
1. Introduction
Artificial intelligence (AI) is the simulation of human intelligence processes by machines,
especially computer systems. It is the ability of a digital computer to perform tasks commonly
associated with intelligent beings. The term is frequently applied to the project of developing
systems endowed with the intellectual processes characteristic of humans, such as ability to
reason, discover meaning, generalize, or learn from past experience. Despite continuing
advances in computer processing speed and memory capacity, there are as yet no programs that
can match human flexibility over wider domains or in tasks requiring much everyday knowledge.
On the other hand, some programs have attained the performance levels of human experts and
professionals in performing certain specific tasks, so that artificial intelligence in this limited
sense is found in applications as diverse as medical diagnosis, computer search engines, and
voice or handwriting recognition. While the huge volume of data that’s being created on a daily
basis would bury a human researcher, AI applications that use machine learning can take that
data and quickly turn it into actionable information.[ CITATION Eri17 \l 1033 ] Lately, AI has
been so general that we don’t even realize that we have always been using it as in some social
networking sites like Facebook, YouTube, Instagram etc. These social networking sites show the
content based on our interest. Moreover, Google AI has been helping us in image recognition,
voice assistant for android devices and so on. Hence, AI is wide-ranging branch of computer
science concerned with building smart machines.[ CITATION Fed16 \l 1033 ]
Supervised learning: Here, the data sets are labeled so that patterns can be detected and
used to label new data sets.
1|Page
Renish Gautam
CU6051NI Artificial Intelligence
Unsupervised learning: Here, data sets are not labeled and are sorted according to
similarities to differences.
Semi-supervised: Here, self-training, multi-view learning, and self-ensembling are
included. Self –training uses a model’s own predictions on unlabeled data to add to the
labeled data set.
Reinforcement learning: Here, data sets are not labeled but, after performing an action
or several actions, the AI system is given feedback.[ CITATION the20 \l 1033 ]
However, machine learning remains a relatively ‘hard’ problem. Machine learning remains a
hard problem when implementing existing algorithms and models to work well for one’s new
application.
Cal19 \l 1033 ] Sentiment analysis provides some answers into what the most important issues
are, from the perspective of customers, at least. Because the sentiment analysis can be
automated, and therefore decisions can be made based on a significant amount of data rather than
plain intuition that is not always right.[ CITATION Jos20 \l 1033 ]
Basic sentiment analysis of the text works in a straightforward process. At, First the text
document is break down into its component parts like phrases, token, sentence and parts of
speech. After that the Identification of each and every sentiment-bearing phrase and the
component is complete. Those components identified are then assigned to each phrase as
sentiment score. Instead, we can merge multi-layered sen scores [CITATION Lex20 \l 1033 ]
For many people, YouTube is used to watch music video, comedy shows, how to guides, recipes,
hacks and more. YouTube can be a great space for teens to discover things they like. It has been
one of the growing platforms with the simplest video sharing service which users can watch,
like, share, comment, and upload their own videos. The YouTubers' main challenges are to
collect all relevant comment and detect them with summarizing the overall responses about the
single video. This is definitely much time consuming. By using the sentiment analysis Youtuber
can easily know about the reviews given by the viewers without spending lot of time. However,
not every person ‘s comment in the videos are same and different kind of emotion are attached in
comments. Some may react badly to any type of disagreement, while others may even thrive
there on. In order to determine the sentiment of the comment Sentiment analysis is used.
At times, the comments of the YouTube can be so toxic that it might sabotage people, religion,
and gender personally. About 500 million comments are deleted. A lot of Youtubers have
complained about the effect they have had on their videos because of hate comments. This
toxicity seems to have a serious impact on how many people tend to engage in conversation and
discourages some from engaging in online conversation altogether. As a result, online platforms
tend to struggle effectively to facilitate connections, resulting in many small groups
3|Page
Renish Gautam
CU6051NI Artificial Intelligence
2. Background
2.1. Sentiment Analysis and its approaches
There are various factors that determines a sentiment of speech or a text, Sentiment analysis
is not a straight procedure. Text information can typically be divided into two main types:
facts and opinions. Opinions are of two types: Comparative and Direct. Direct opinions give
an opinion about an entity directly. [ CITATION Sne17 \l 1033 ]
There are numerous types of sentiment analysis. Systems which focuses on polarity (positive,
negative, neutral) and some systems that detect feelings and emotions or identify intentions
are some important types. Similar emotions such as disappointment, frustration or anxiety
(i.e.
negative feelings) or joy, affection or excitement (i.e. positive feelings) are correlated with th
e polarity of a text. Machine learning and Lexicons algorithm are used to detect the emotions
and feelings from texts. When a system is restored to lexicons, it becomes very tricky as the
way people express their emotions varies greatly and so do the lexical items they use.
2.1.1. Approaches
Currently there are many methods and algorithms introduced that extracts sentiment out of
texts. Computation linguistic is very huge that research and works are still going on to
improve the end result or accuracy that these methods provide. The sentiment analysis
Rule-based: Set of rules are described in this approach that identifies subjectivity,
polarity, or the subject of an opinion via some form of scripting language. Classic NLP
techniques such as tokenization, part of speech marking, stemming, sorting and other
tools such as lexicons are the variety of inputs that can be used in this method.
[ CITATION Mon20 \l 1033 ]
Automatic: That is the approach to learning from data based on machine learning
techniques. In this approach, the task is modeled as a classification problem where a
4|Page
Renish Gautam
CU6051NI Artificial Intelligence
classifier is fed with a text and then returns corresponding sentiment e.g., negative,
positive or neutral. The classifier is applied with the training samples by first training a
model to associate a specific input with the respective output. The pairs of tags and
feature vectors (e.g. positive, negative, or neutral) are fed into the machine learning
algorithm to generate a model. The second step is the process of prediction, in which the
feature extractor transforms the unseen text inputs into feature vectors. When those
feature vectors are fed into the model, the predicted tags are generated. Naïve Bayes,
Logistic Regression, Support Vector machines and Neural Networks are under
supervision learning the classification algorithms which are commonly used.[ CITATION
Mon20 \l 1033 ]
Hybrid: The concept of hybrid methods is very intuitive: just combine the best of both
worlds, the rule-based and the automatic ones. Usually, by combining both approaches,
the methods can improve accuracy and precision[ CITATION Mon20 \l 1033 ]
5|Page
Renish Gautam
CU6051NI Artificial Intelligence
6|Page
Renish Gautam
CU6051NI Artificial Intelligence
In the journal written by Lambodara Parabhoi, and Payel Saha namely, Sentiment
Analysis of YouTube Comments on Koha Open Source Software Videos has conducted
sentiment analysis on total of 404 comment on Koha ILS video on the Youtube Channel.
The main objective of this project was to analyze if the comments were positive, negative
or neutral. It discusses on using Naïve Bayes Algorithm for the sentiment analysis. They
used Parallel Dots API and Google Spreadsheet using AYLIEN Text Analysis API. The
sentiment analysis was done on categories like intention, subjectivity and sentiments,
emotion and world frequency.[ CITATION Par18 \l 1033 ]
In another research the authors Joe Timoney, Adarsh Raj, and Brian Davis conducted
Sentiment Analysis on comment of extracted from Youtube’s song. 250 song titles were
gathered and total of 100 comments were extracted from these videos. Various
Classification approaches such as Naïve Bayes, Decision Tree, Cross Validation
techniques and Evaluation metrics were discussed. Two machine learning algorithms
were tested: Naïve Bayes and Decision Trees. The accuracy obtained using Naïve Bayes
was 79% and Decision tree was 86.09%.[ CITATION Tim19 \l 1033 ]
In the third research written the authors have proposed to present Natural Language
Processing (NLP) based sentiment analysis approach on user comment on the Youtube.
They have proved the effectiveness of scheme by data driven experiment in terms of
accuracy of finding popular and high-quality videos. The NLP process consisted of four
processes: Comment collection and preprocessing, Generation of data sets, sentiment
measures and video rating.[CITATION Placeholder1 \l 1033 ]
7|Page
Renish Gautam
CU6051NI Artificial Intelligence
social influencers?
decision.
So much so that nearly 40% of those who participated in the survey said they had
By tracking sentiment in your industry and searching specific keywords, you can
track influencers talking about your product and engage with their fans as well.
8|Page
Renish Gautam
CU6051NI Artificial Intelligence
3. Solution
3.1. Explanation of the proposed solution/approach to solving the problem
Taking account of above research and explanations it is clear that sentiment analysis can be
Brand Monitoring
Customer Support
Customer Feedback
Product Analytics, etc.
Supervised Learning is preferable to achieve the task of predicting the feeling of YouTube
comments in order to successfully complete the proposed problem among many approaches
9|Page
Renish Gautam
CU6051NI Artificial Intelligence
of sentiment analysis. Naïve Bayes is the algorithm for predicting the sentiment among the
many algorithms under the neural network. For the YouTube comments, Kaggle is used to
gather training datasets.
Fast
Requires less training data
Highly scalable
It can make probabilistic prediction
It is easy to implement
It works more efficiently than other algorithms if the independence assumption holds.
[ CITATION edu201 \l 1033 ]
10 | P a g e
Renish Gautam
CU6051NI Artificial Intelligence
For example, if a fruit is red, round, and around 3 inches in diameter, it may be called an apple.
Even if these characteristics depend on each other or on the existence of the other characteristics,
all these characteristics contribute independently to the probability that this fruit is an apple,
which is why it is called' Naive.'
Naive Bayes model is simple to build and especially useful for very large data sets. Naive Bayes
is considered to outperform even highly sophisticated methods of classification, as well as
simplicity.[ CITATION Sun17 \l 1033 ]
Bayes Theorem provides a way for P(c), P(x) and P(x) to measure posterior probability. Look at
the equation underneath:
Here,
3.3. Pseudocode
Import necessary libraries
11 | P a g e
Renish Gautam
CU6051NI Artificial Intelligence
Read dataset and separate sentiment text and its sentiment label.
Remove stopwords.
Tokenization.
model=naive_bayes.MultinomialNB()
model.fit(X_train,y_train)
12 | P a g e
Renish Gautam
CU6051NI Artificial Intelligence
my_vectorizer=vectorizer.transform(my_test_data)
model.predict(my_vectorizer
Compare real response value with the value of the expected response.
3.4. Flowchart
13 | P a g e
Renish Gautam
CU6051NI Artificial Intelligence
14 | P a g e
Renish Gautam
CU6051NI Artificial Intelligence
4. Conclusion
5. Bibliography
15 | P a g e
Renish Gautam
CU6051NI Artificial Intelligence
Bhuiyan, H., ara, J., Bardhan , R. & Islam, R. (2017) Retrieving YouTube Video by Sentiment
Analysis on User Comment onn User Comment. Proc. of the 2017 IEEE International
Conference on Signal and Image Processing Applications , p.478.
Jadav, S. (2017) Sentiment Analysis: A Review. Scientific Journal of Impact Factor (SJIF): 4.72
, p.962.
Parabhoi, & Saha,. (2018) Sentiment Analysis of YouTube Comments on Koha Open Source
Software Videos. International Journal of Library and Information Studies, 8, p.102.
Pozzi, F.A. (2016) Sentiment Analysis in Social Networks. In Sentiment Analysis in Social
Networks. 1st ed. Morgan Kaufmann. p.284.
Ray, S. (2017) 6 Easy Steps to Learn Naive Bayes Algorithm with codes in Python and R
[Online]. Available from: https://www.analyticsvidhya.com/blog/2017/09/naive-bayes-
explained/?fbclid=IwAR1-5mSCWS8WwOHc3B6OJPy8-
R73G3OqTxDWn42c528CoOZO2jw5BQYXmSM [Accessed 11 September 2017].
16 | P a g e
Renish Gautam
CU6051NI Artificial Intelligence
Timoney, , Raj, & Davis , B. (2019) Nostalgic Sentiment Analysis of YouTube Comments for
Chart Hits of the 20th Century. Maynooth: Dept. of Computer Science, Maynooth University.
17 | P a g e
Renish Gautam