Professional Documents
Culture Documents
Detection of Fake News
Detection of Fake News
Detection of Fake News
TRAINING
DATA MODEL
DATASET NLTK PRE- TRAINING
NEWS PROCESSI
ML –NB
DATASET NG
TESTING
DATA
Evaluation/
Save Model &
Weights
tokenizer = RegexpTokenizer(r'\w+')
message = TextProcessor.replace_emojis(message)
message = TextProcessor.remove_mention_and_url(message)
message = TextProcessor.remove_punctuation(message)
message = TextProcessor.tokenizer.tokenize(message.lower())
message = TextProcessor.remove_stopwords(message)
message = TextProcessor.word_stemmer(message)
message = TextProcessor.word_lemmatizer(message)
Tokenization
Stopwords eg i’, ‘me’, ‘my’, ‘myself’, ‘we’, ‘our’, ‘ours’, ‘ourselves’, ‘you’, “you’re”, “you’ve”, “you’ll”, “you’d”, ‘your’,
.
‘yours’, ‘yourself’, ‘yourselves’, ‘he’, ‘him’, ‘his’, ‘himself’, ‘she’, “she’s”, ‘her’, ‘hers’, ‘herself’, ‘it’, “it’s”, ‘its’, ‘itself’, ‘they’, ‘them’,
‘their’, ‘theirs’, ‘themselves’, ‘what’, ‘which’, ‘who’, ‘whom’, ‘this’, ‘that’, “that’ll”, ‘these’, and so.on
DATA FLOW
1:Input 2:steps
Input dataset Pre- process X,Y Train split
Dataset
Pre-process