Rapport Text Mining

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Rapport de mini projet RNN

1. Imports :

First : we need to Import the necessary packages in order to use


them in our project .

2. Stop words :

Stop words are words which are filtered out before or after processing
of natural language data (text). Though "stop words" usually refers to
the most common words in a language, there is no single universal
list of stop words used by all natural language processing tools, and
indeed not all tools even use such a list. Some tools specifically avoid
removing these stop words to support phrase search.

3. repeated words:
This part of our algorithm looks for the repeated
words and rename them to the root of that word
ex: if it finds the word ‘books’, it will change it
‘book’ ect ...
4.frequent words :

This function calculate how often a repeated word is found in


the text, if a word is repeated more than 5 times
Then this word is a frequent word. Else we delete it .
5.word net :

The WordNet function finds all the synonyms for a word and
renames them to just one word.
6. the key words vector :
This function scan all texts we have and removes the
repeated words and store it in one list.
7. open file function :

This function load a txt file from the system, after that it
generate the key vector of that text using the original key
vector of the dataset, And then we feed the generated vector
to the multiple layer perceptron that we saved after the
training step …
Finally it classify the text (sport,economic,medical) by the
result of the MLP …
8. the main function :

This is the main function which calls all previous


functions .
- Read and vectories all dataset
(test+training ).
- Save the global key vector .
- Generate the target vector .
- Create the MLP model .
- Train the model and get the prediction from
the test data then we save it.
- Calculate the precision of the model.
9. the interface : (GUI)

You might also like