Text Mining

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 1

Document -> collection of words (each word=a feature), ignore the order/grammar

Each word -> equal weight to be a keyword


Word (repeated the most, term frequency) = keyword of the document
TF (t,d): word t, document d

Clean the text (before comparing the two texts for similarity)
Remove Stop-word = does not change the meaning
Remove word to root (word-stemming)

You might also like