Professional Documents
Culture Documents
Rapport Text Mining
Rapport Text Mining
Rapport Text Mining
1. Imports :
2. Stop words :
Stop words are words which are filtered out before or after processing
of natural language data (text). Though "stop words" usually refers to
the most common words in a language, there is no single universal
list of stop words used by all natural language processing tools, and
indeed not all tools even use such a list. Some tools specifically avoid
removing these stop words to support phrase search.
3. repeated words:
This part of our algorithm looks for the repeated
words and rename them to the root of that word
ex: if it finds the word ‘books’, it will change it
‘book’ ect ...
4.frequent words :
The WordNet function finds all the synonyms for a word and
renames them to just one word.
6. the key words vector :
This function scan all texts we have and removes the
repeated words and store it in one list.
7. open file function :
This function load a txt file from the system, after that it
generate the key vector of that text using the original key
vector of the dataset, And then we feed the generated vector
to the multiple layer perceptron that we saved after the
training step …
Finally it classify the text (sport,economic,medical) by the
result of the MLP …
8. the main function :