Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 20

CU6051NA - Artificial Intelligence

20% Individual Coursework

2019-20 Autumn

Student Name: Renish Gautam

London Met ID: 17031035

College ID: np01cp4a170052

Assignment Due Date: 13th January 2020

Assignment Submission Date: 13th January 2020

I confirm that I understand my coursework needs to be submitted online via Google Classroom under the relevant
module page before the deadline in order for my assignment to be accepted and marked. I am fully aware that late
submissions will be treated as non-submission and a mark of zero
Contents
1. Introduction..............................................................................................................................1
1.1. Explanation of the AI concept chosen..............................................................................2
1.1.1. Sentiment Analysis....................................................................................................2
1.2. Explanation/introduction of the chosen problem domain/topic........................................3
2. Background..............................................................................................................................4
2.1. Sentiment Analysis and its approaches.............................................................................4
2.1.1. Approaches................................................................................................................4
2.2. Research works done on Sentiment Analysis...................................................................6
2.3. Current applications of Sentiment analysis.......................................................................6
3. Solution....................................................................................................................................7
3.1. Explanation of the proposed solution/approach to solving the problem...........................7
3.2. Explanation of the AI algorithm.......................................................................................8
3.3. Pseudocode........................................................................................................................9
3.4. Flowchart........................................................................................................................10
4. Conclusion.............................................................................................................................11
4.1. Analysis of the work done..............................................................................................11
4.2. Solution addressing the real-world problems..................................................................11
4.3. Further work....................................................................................................................11
5. Bibliography..........................................................................................................................12
Table of Figures

Figure 1: Different Approaches on sentiment analysis....................................................................4


Figure 2: Bayes Theoram................................................................................................................7
CU6051NI Artificial Intelligence

1. Introduction
Artificial intelligence (AI) is the simulation of human intelligence processes by machines,
especially computer systems. It is the ability of a digital computer to perform tasks commonly
associated with intelligent beings. The term is frequently applied to the project of developing
systems endowed with the intellectual processes characteristic of humans, such as ability to
reason, discover meaning, generalize, or learn from past experience. Despite continuing
advances in computer processing speed and memory capacity, there are as yet no programs that
can match human flexibility over wider domains or in tasks requiring much everyday knowledge.
On the other hand, some programs have attained the performance levels of human experts and
professionals in performing certain specific tasks, so that artificial intelligence in this limited
sense is found in applications as diverse as medical diagnosis, computer search engines, and
voice or handwriting recognition. While the huge volume of data that’s being created on a daily
basis would bury a human researcher, AI applications that use machine learning can take that
data and quickly turn it into actionable information.[ CITATION Eri17 \l 1033 ] Lately, AI has
been so general that we don’t even realize that we have always been using it as in some social
networking sites like Facebook, YouTube, Instagram etc. These social networking sites show the
content based on our interest. Moreover, Google AI has been helping us in image recognition,
voice assistant for android devices and so on. Hence, AI is wide-ranging branch of computer
science concerned with building smart machines.[ CITATION Fed16 \l 1033 ]

Machine learning is the science of getting a computer to act without programming. It is an


application of AI. Deep learning is a subset of machine learning that, in very simple terms, can
be thought of as the automation of predictive analytics. Such computer programs are allowed to
learn, modify, develop and grow by themselves when introduced to new data. The process of
machine learning begins with observation of data, like direct experience, or instruction, in order
to look for patterns in data and make better decisions in the future based on the data that were
provided. There are four types of machine learning algorithms:

 Supervised learning: Here, the data sets are labeled so that patterns can be detected and
used to label new data sets.

1|Page
Renish Gautam
CU6051NI Artificial Intelligence

 Unsupervised learning: Here, data sets are not labeled and are sorted according to
similarities to differences.
 Semi-supervised: Here, self-training, multi-view learning, and self-ensembling are
included. Self –training uses a model’s own predictions on unlabeled data to add to the
labeled data set.
 Reinforcement learning: Here, data sets are not labeled but, after performing an action
or several actions, the AI system is given feedback.[ CITATION the20 \l 1033 ]

However, machine learning remains a relatively ‘hard’ problem. Machine learning remains a
hard problem when implementing existing algorithms and models to work well for one’s new
application.

1.1. Explanation of the AI concept chosen


Social Medias, these days, contain rapidly changing information generated by millions of users
that can dramatically affect one’s personality or the reputation of an organization. This shows the
importance of sentiment analysis. YouTube, as a unique platform, is multimodal and contains
social graph and discussion between people with various opinions. Those opinions might be
positive, negative or neutral. The YouTube API is not effective at formatting comments by
relevance, although it claims to do so. As a result, the most relevant comments do not align with
the top comments at all, they are not even sorted by likes or replies. So I found it very important
for the community to conduct sentiment analysis research on YouTube comments.

1.1.1. Sentiment Analysis


Sentiment Analysis is the process of analyzing online pieces of writing to determine the
emotional tone they carry. In other words, sentiment analysis is the automated process of
classifying online text data as positive, neutral or negative, giving businesses the opportunity to
gain a deeper understanding of how customers perceive their product, brand or service.
Currently, sentiment analysis is a topic of great interest and development since it has many
practical applications. Companies use sentiment analysis to automatically analyze survey
responses, product reviews, social media comments, and they like to get valuable insights about
their brands, product, and services. Sentiment analysis helps data analysts with large number of
businesses to collect public opinion, conduct complex market research, monitor products brand
and reputation, analyze the comments and understand the end users experience.[ CITATION
2|Page
Renish Gautam
CU6051NI Artificial Intelligence

Cal19 \l 1033 ] Sentiment analysis provides some answers into what the most important issues
are, from the perspective of customers, at least. Because the sentiment analysis can be
automated, and therefore decisions can be made based on a significant amount of data rather than
plain intuition that is not always right.[ CITATION Jos20 \l 1033 ]

Basic sentiment analysis of the text works in a straightforward process. At, First the text
document is break down into its component parts like phrases, token, sentence and parts of
speech. After that the Identification of each and every sentiment-bearing phrase and the
component is complete. Those components identified are then assigned to each phrase as
sentiment score. Instead, we can merge multi-layered sen scores [CITATION Lex20 \l 1033 ]

1.2. Explanation/introduction of the chosen problem domain/topic

For many people, YouTube is used to watch music video, comedy shows, how to guides, recipes,
hacks and more. YouTube can be a great space for teens to discover things they like. It has been
one of the growing platforms with the simplest video sharing service which users can watch,
like, share, comment, and upload their own videos. The YouTubers' main challenges are to
collect all relevant comment and detect them with summarizing the overall responses about the
single video. This is definitely much time consuming. By using the sentiment analysis Youtuber
can easily know about the reviews given by the viewers without spending lot of time. However,
not every person ‘s comment in the videos are same and different kind of emotion are attached in
comments. Some may react badly to any type of disagreement, while others may even thrive
there on. In order to determine the sentiment of the comment Sentiment analysis is used.

At times, the comments of the YouTube can be so toxic that it might sabotage people, religion,
and gender personally. About 500 million comments are deleted. A lot of Youtubers have
complained about the effect they have had on their videos because of hate comments. This
toxicity seems to have a serious impact on how many people tend to engage in conversation and
discourages some from engaging in online conversation altogether. As a result, online platforms
tend to struggle effectively to facilitate connections, resulting in many small groups

3|Page
Renish Gautam
CU6051NI Artificial Intelligence

2. Background
2.1. Sentiment Analysis and its approaches
There are various factors that determines a sentiment of speech or a text, Sentiment analysis
is not a straight procedure. Text information can typically be divided into two main types:
facts and opinions. Opinions are of two types: Comparative and Direct. Direct opinions give
an opinion about an entity directly. [ CITATION Sne17 \l 1033 ]

There are numerous types of sentiment analysis. Systems which focuses on polarity (positive,
negative, neutral) and some systems that detect feelings and emotions or identify intentions
are some important types. Similar emotions such as disappointment, frustration or anxiety
(i.e.
negative feelings) or joy, affection or excitement (i.e. positive feelings) are correlated with th
e polarity of a text. Machine learning and Lexicons algorithm are used to detect the emotions
and feelings from texts. When a system is restored to lexicons, it becomes very tricky as the
way people express their emotions varies greatly and so do the lexical items they use.

2.1.1. Approaches
Currently there are many methods and algorithms introduced that extracts sentiment out of

texts. Computation linguistic is very huge that research and works are still going on to

improve the end result or accuracy that these methods provide. The sentiment analysis

systems are classified as following:

 Rule-based: Set of rules are described in this approach that identifies subjectivity,
polarity, or the subject of an opinion via some form of scripting language. Classic NLP
techniques such as tokenization, part of speech marking, stemming, sorting and other
tools such as lexicons are the variety of inputs that can be used in this method.
[ CITATION Mon20 \l 1033 ]

 Automatic:  That is the approach to learning from data based on machine learning
techniques. In this approach, the task is modeled as a classification problem where a

4|Page
Renish Gautam
CU6051NI Artificial Intelligence

classifier is fed with a text and then returns corresponding sentiment e.g., negative,
positive or neutral. The classifier is applied with the training samples by first training a
model to associate a specific input with the respective output. The pairs of tags and
feature vectors (e.g. positive, negative, or neutral) are fed into the machine learning
algorithm to generate a model. The second step is the process of prediction, in which the
feature extractor transforms the unseen text inputs into feature vectors. When those
feature vectors are fed into the model, the predicted tags are generated. Naïve Bayes,
Logistic Regression, Support Vector machines and Neural Networks are under
supervision learning the classification algorithms which are commonly used.[ CITATION
Mon20 \l 1033 ]

 Hybrid:  The concept of hybrid methods is very intuitive: just combine the best of both
worlds, the rule-based and the automatic ones. Usually, by combining both approaches,
the methods can improve accuracy and precision[ CITATION Mon20 \l 1033 ]

Figure 1: Different Approaches on sentiment analysis

5|Page
Renish Gautam
CU6051NI Artificial Intelligence

6|Page
Renish Gautam
CU6051NI Artificial Intelligence

2.2. Research works done on Sentiment Analysis


Many researches have been conducted on sentiment analysis. Some of the research
papers and journals studied are as follows:

In the journal written by Lambodara Parabhoi, and Payel Saha namely, Sentiment
Analysis of YouTube Comments on Koha Open Source Software Videos has conducted
sentiment analysis on total of 404 comment on Koha ILS video on the Youtube Channel.
The main objective of this project was to analyze if the comments were positive, negative
or neutral. It discusses on using Naïve Bayes Algorithm for the sentiment analysis. They
used Parallel Dots API and Google Spreadsheet using AYLIEN Text Analysis API. The
sentiment analysis was done on categories like intention, subjectivity and sentiments,
emotion and world frequency.[ CITATION Par18 \l 1033 ]

In another research the authors Joe Timoney, Adarsh Raj, and Brian Davis conducted
Sentiment Analysis on comment of extracted from Youtube’s song. 250 song titles were
gathered and total of 100 comments were extracted from these videos. Various
Classification approaches such as Naïve Bayes, Decision Tree, Cross Validation
techniques and Evaluation metrics were discussed. Two machine learning algorithms
were tested: Naïve Bayes and Decision Trees. The accuracy obtained using Naïve Bayes
was 79% and Decision tree was 86.09%.[ CITATION Tim19 \l 1033 ]

In the third research written the authors have proposed to present Natural Language
Processing (NLP) based sentiment analysis approach on user comment on the Youtube.
They have proved the effectiveness of scheme by data driven experiment in terms of
accuracy of finding popular and high-quality videos. The NLP process consisted of four
processes: Comment collection and preprocessing, Generation of data sets, sentiment
measures and video rating.[CITATION Placeholder1 \l 1033 ]

7|Page
Renish Gautam
CU6051NI Artificial Intelligence

2.3. Current applications of Sentiment analysis

Uncovering Brand Influencers


Sure, the average consumer might be boasting about your product but what about

social influencers?

A recent study from Twitter found the opinions of influencers can be just as

important as a friend’s opinion when it comes to someone making a buying

decision.

So much so that nearly 40% of those who participated in the survey said they had

bought something  just because an influencer had posted about it on Twitter.

Although finding the right influencers can prove to be a challenge, cashing in on this

millennial goldmine doesn’t have to be hard.

By tracking sentiment in your industry and searching specific keywords, you can

track influencers talking about your product and engage with their fans as well.

Social Media Monitoring


Social media monitoring is another way businesses are currently using sentiment
analysis to keep track of what customers are saying.
By digging into all of your customer’s social media opinions about your brand, you
are also able to automatically categorize issues of urgency so you can deal with
them straight away.
Think about it, if 3,000 customers mention your brand overnight—that’s a lot of
tweets to go through before your morning coffee.
But with sentiment analysis, the data can be automatically put into categories of
positive, neutral, and negative. This can allow your customer service team to put
out urgent fires from disgruntled customers before you say thanks to your happier
ones.
Take a look at this example of a Twitter sentiment analysis of major U.S airlines
over a month in 2015:

8|Page
Renish Gautam
CU6051NI Artificial Intelligence

Figure 2: social Media Monitoring

3. Solution
3.1. Explanation of the proposed solution/approach to solving the problem
Taking account of above research and explanations it is clear that sentiment analysis can be

used for various aspects like:

 Brand Monitoring
 Customer Support
 Customer Feedback
 Product Analytics, etc.

Supervised Learning is preferable to achieve the task of predicting the feeling of YouTube
comments in order to successfully complete the proposed problem among many approaches

9|Page
Renish Gautam
CU6051NI Artificial Intelligence

of sentiment analysis. Naïve Bayes is the algorithm for predicting the sentiment among the
many algorithms under the neural network. For the YouTube comments, Kaggle is used to
gather training datasets.

Reasons for choosing Naïve Bayes are listed below:

 Fast
 Requires less training data
 Highly scalable
 It can make probabilistic prediction
 It is easy to implement
 It works more efficiently than other algorithms if the independence assumption holds.
[ CITATION edu201 \l 1033 ]

10 | P a g e
Renish Gautam
CU6051NI Artificial Intelligence

3.2. Explanation of the AI algorithm


Naïve Bayes is a probabilistic algorithm based on Bayes ' Theorem, with an assumption of
independence between predictors. A Naive Bayes classifier assumes, in simple terms, that the
inclusion of a particular feature in a class is unrelated to any other feature being present.

For example, if a fruit is red, round, and around 3 inches in diameter, it may be called an apple.
Even if these characteristics depend on each other or on the existence of the other characteristics,
all these characteristics contribute independently to the probability that this fruit is an apple,
which is why it is called' Naive.'

Naive Bayes model is simple to build and especially useful for very large data sets. Naive Bayes
is considered to outperform even highly sophisticated methods of classification, as well as
simplicity.[ CITATION Sun17 \l 1033 ]

Bayes Theorem provides a way for P(c), P(x) and P(x) to measure posterior probability. Look at
the equation underneath:

Figure 3: Bayes Theoram

Here,

 P(c|x) is the posterior probability


of class (c, tar get)
given predicto r (x, attributes).
 P(c) is the prior probability of class.
 P(x|c) is the likelihood which is the probability of predictor given class.
 P(x) is the prior probability of predictor.

3.3. Pseudocode
Import necessary libraries

11 | P a g e
Renish Gautam
CU6051NI Artificial Intelligence

Collect labeled training datasets

Read dataset and separate sentiment text and its sentiment label.

dataframe = Pandas.readCsv(“training data”)

Split dataframe and sentiment labeltraining and testing set

dataframe _train, dataframe_test,label,


training_train,training_test=train_test_split(dataframe,label training,
test_size=0.2,random_state=1)

Perform data pre-processing

Remove stopwords.

Tokenization.

Ignoring case and punctuation

Strip white space.

Remove numbers and other characters

Train the model on training set

model=naive_bayes.MultinomialNB()

model.fit(X_train,y_train)

12 | P a g e
Renish Gautam
CU6051NI Artificial Intelligence

Make the prediction on testing set

my_test_data=['This is really good','This was bad']

my_vectorizer=vectorizer.transform(my_test_data)

model.predict(my_vectorizer

Compare real response value with the value of the expected response.

3.4. Flowchart

Figure 4: Flowchart Diagram

13 | P a g e
Renish Gautam
CU6051NI Artificial Intelligence

14 | P a g e
Renish Gautam
CU6051NI Artificial Intelligence

4. Conclusion

4.1. Analysis of the work done


This documentation includes study of Artificial Intelligence. We understood that AI
comprised of various concept which include Machine Learning, Deep Learning and
Neural Networks.
Machine Learning is the subset of the AI which includes NLP as one of its sub types. We
understood that Sentiment Analysis is an important application of AI which automatically
help analyze text into positive or negative label. For this assignment we have briefly
analyzed and introduced to the topic sentiment analysis. An application will be developed
for analyzing sentiment of YouTube comment.

4.2. Solution addressing the real-world problems


With above researches we can conclude that sentiment analysis is an important tool for
improvement of human life. Sentiment analysis on Youtube comment will help youtubers
to know the preferences of the viewer and increase their revenue. With accuracy of
sentiment analysis, the admin of the youtube can avoid cyber crime by deleting offensive
comment and protect privacy of the youtube video creators. Further, it can also help
youtubers to improve their content and make necessary improvements.

4.3. Further work


In this coursework we have conducted research on various topic of AI. We understood
general concept of NLP and ML and about sentiment analysis. For further work we will
be developing a working application that would conduct sentiment analysis on Youtube
comments that are collected from dataset. After coding, final documentation is to be done
which further explains the steps and method used for the development.

5. Bibliography
15 | P a g e
Renish Gautam
CU6051NI Artificial Intelligence

Bhuiyan, H., ara, J., Bardhan , R. & Islam, R. (2017) Retrieving YouTube Video by Sentiment
Analysis on User Comment onn User Comment. Proc. of the 2017 IEEE International
Conference on Signal and Image Processing Applications , p.478.

Cambria, E. (2017) A Practical Guide to Sentiment Analysis (Socio-Affective Computing). In A


Practical Guide to Sentiment Analysis (Socio-Affective Computing). 1st ed. Springer. p.196.

educba. (2020) Sentiment Analysis in Social Media [Online]. Available from:


https://www.educba.com/sentiment-analysis-social-media/ [Accessed 2020].

Hardy, J. (2020) Social Media Today [Online]. Available from:


https://www.socialmediatoday.com/content/introduction-sentiment-analysis [Accessed 2020].

Jadav, S. (2017) Sentiment Analysis: A Review. Scientific Journal of Impact Factor (SJIF): 4.72
, p.962.

lexalytics. (2020) Sentiment Analysis Explained [Online]. Available from:


https://www.lexalytics.com/technology/sentiment-analysis [Accessed 2020].

Miner, C. (2019) What is Sentiment Analysis? [Online]. Available from:


https://callminer.com/blog/sentiment-analysis-examples-best-practices/ [Accessed 30 April
2019].

Monkey Learn. (2020) Sentiment Analysis [Online]. Available from:


https://monkeylearn.com/sentiment-analysis/ [Accessed 1 January 2020].

Parabhoi, & Saha,. (2018) Sentiment Analysis of YouTube Comments on Koha Open Source
Software Videos. International Journal of Library and Information Studies, 8, p.102.

Pozzi, F.A. (2016) Sentiment Analysis in Social Networks. In Sentiment Analysis in Social
Networks. 1st ed. Morgan Kaufmann. p.284.

Ray, S. (2017) 6 Easy Steps to Learn Naive Bayes Algorithm with codes in Python and R
[Online]. Available from: https://www.analyticsvidhya.com/blog/2017/09/naive-bayes-
explained/?fbclid=IwAR1-5mSCWS8WwOHc3B6OJPy8-
R73G3OqTxDWn42c528CoOZO2jw5BQYXmSM [Accessed 11 September 2017].

16 | P a g e
Renish Gautam
CU6051NI Artificial Intelligence

theappsolutions. (2020) 4 TYPES OF MACHINE LEARNING ALGORITHMS [Online].


Available from: https://theappsolutions.com/blog/development/machine-learning-algorithm-
types/ [Accessed 13 January 2020].

Timoney, , Raj, & Davis , B. (2019) Nostalgic Sentiment Analysis of YouTube Comments for
Chart Hits of the 20th Century. Maynooth: Dept. of Computer Science, Maynooth University.

17 | P a g e
Renish Gautam

You might also like