Professional Documents
Culture Documents
Memoir Presentaion Tex
Memoir Presentaion Tex
Aberkane Rania
Faculty of Sciences
Department of Computer Science
1 Introduction
2 Theoretical background
3 Preprocessing
4 Implementation
5 Results
6 Conclusion
1 Introduction
2 Theoretical background
3 Preprocessing
4 Implementation
5 Results
6 Conclusion
The rapid growth of the internet and computer technologies has caused the existence of billions of
electronic text documents which are created, edited, and stored in digital ways. Hence, The Manual
procedures for text classification become laborious, time-consuming, and potentially unreliable. So, for
that we must use the automatic techniques to facilitate assignment of text to categories.
The rapid growth of the internet and computer technologies has caused the existence of billions of
electronic text documents which are created, edited, and stored in digital ways. Hence, The Manual
procedures for text classification become laborious, time-consuming, and potentially unreliable. So, for
that we must use the automatic techniques to facilitate assignment of text to categories.
Text classification draw more and more attention recently, it has been applied on different domains
including web mining, opinion mining, and sentiment analysis.
1 Introduction
2 Theoretical background
3 Preprocessing
4 Implementation
5 Results
6 Conclusion
Today, Sentiment analysis is one of the fastest growing research areas in computer science. So what is
sentiment analysis ??
Definition
Sentiment analysis is the techniques helps to extract subjective information and to analyze the
sentiments of the people interacting online using the social channels like Facebook, Twitter,
Instagram, comments and other social networking sites.
Given the importance of sentiment analysis the number of papers is increasing rapidly as can be
observed from Figure:
The big growth of the Arabic internet content in the last years has raised up the need for an Arabic
language processing tools. So,How can we classify the Arabic text ? and what are the tools to help us
to that ?
The big growth of the Arabic internet content in the last years has raised up the need for an Arabic
language processing tools. So,How can we classify the Arabic text ? and what are the tools to help us
to that ?
1 Introduction
2 Theoretical background
3 Preprocessing
4 Implementation
5 Results
6 Conclusion
Before we can use data collected we need to do some preprocessing to remove unnecessary informations:
Removal of URLs.
Tokenization.
Noramlization(replacing specific letters,numeric data, punctuation, spaces and single letters).
Remove stop words:such as stop words pronouns, conjunctions, and prepositions, names.
Stemming.
1 Introduction
2 Theoretical background
3 Preprocessing
4 Implementation
5 Results
6 Conclusion
After preprocessing is completed, it comes to mind the following question How To Prepare Data For
Machine Learning ?
Machine learning cannot work with raw text directly, the text must be converted into numbers. It
should be represented like this:
So, how can we transform the data into numbers ? and what can features represent ? that what we will
discover in the next slides.
N-grams are basic features of bag-of-words. features can be single words (Unigrams), two word
(Bigrams) or three words (Trigrams).
Example:
We tested six machine learning classification methods that are commonly used in Sentiment Analysis
which are:
We tested six machine learning classification methods that are commonly used in Sentiment Analysis
which are:
1 Introduction
2 Theoretical background
3 Preprocessing
4 Implementation
5 Results
6 Conclusion
1 Introduction
2 Theoretical background
3 Preprocessing
4 Implementation
5 Results
6 Conclusion
Multinomial Nave Bayes are the likely models to work best in our dataset.
We tested diferent features and found that unigrams and bigrams works the best.
By monitoring attitudes and opinions about any topic, we are able to detect shifts in opinions and
adapt readily to meet the changing needs.
We would like to develop our data for different categorizations with good accuracy.