Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

BAG OF WORDS ALGORITHM

Paragraph –

We can use health chatbots for treating stress

We can use NLP to create chatbots and we will be making health chatbots now!

Health chatbots cannot replace human counselors now.

Step 1- Text Normalization

Sentence segmentation

Sent 1: We can use health chatbots for treating stress.

Sent 2: We can use NLP to create chatbots and we will be making health chatbots now!

Sent 3: Health chatbots cannot replace human counselors now.

Tokenization

Sent 1: We can use health chatbots for treating stress .

Sent 2: We can use NLP to create chatbots and we will be making health chatbots now !

Sent 3: Health chatbots cannot replace human counselors .

Removing stop words, special characters, numbers

Sent 1: We health chatbots treating stress

Sent 2: We NLP create chatbots making health chatbots

Sent 3: Health chatbots replace human counselors

Converting stentences to lower case

Sent 1: we health chatbots treating stress

Sent 2: we nlp create chatbots making health chatbots

Sent 3: health chatbots replace human counselors

Stemming

Sent 1: we health chatbots treat stress.


Sent 2: we nlp create chatbots make health chatbots

Sent 3: health chatbots replace human counselor

Lemmatization

Sent 1: we health chatbot treat stress

Sent 2: we nlp create chatbot make health chatbot

Sent 3: health chatbot replace human counselor

Step 2- Create dictionary

w healt chatb tre stre nl crea mak repla counsel huma


e h ot at ss p te e ce or n

Step 3- Make document vector for all the sentences

we health chatbot treat stress nlp create make replace counselor human

Sent 1 1 1 1 1 0 0 0 0 0 0
1

Sent 1 1 1 0 0 1 1 1 0 0 0
2

Sent 0 1 1 0 0 0 0 0 1 1 1
3

Step 4: TFIDF

 Term Frequency

w health chatbot treat stres nlp creat mak replac human counselor
e s e e e

2 3 3 1 1 1 1 1 1 1 1

 Document frequency

we health chatbot trea stress nlp creat mak replace huma counselor
t e e n

3/ 3/3 3/3 3/1 3/1 3/1 3/1 3/1 3/1 3/1 3/1
2
TFIDF(W) = TF(W)*log[IDF(W)]

we health chatbot treat stress nlp create make replace counselo human
r

Sent 1*log3/ 1*log3/ 1*log3/ 1*log3/ 1*log3/ 0*log3/ 0*log3/ 0*log3/ 0*log3/ 0*log3/1 0*log3/1
1 2 3 3 1 1 1 1 1 1

Sent 1*log3/ 1*log3/ 1*log3/ 0*log3/ 0*log3/ 1*log3/ 1*log3/ 1*log3/ 0*log3/ 0*log3/1 0*log3/1
2 2 3 3 1 1 1 1 1 1

Sent 0*log3/ 1*log3/ 1*log3/ 0*log3/ 0*log3/ 0*log3/ 0*log3/ 0*log3/ 1*log3/ 1*log3/1 1*log3/1
3 2 3 3 1 1 1 1 1 1

IDF Values

we health chatbot treat stress nlp creat make replac counselo human
e e r

Sent1 0.176 0 0 0.477 0.477 0 0 0 0 0 0

Sent2 0.176 0 0 0 0 0.477 0.477 0.477 0 0 0

Sent3 0 0 0 0 0 0 0 0 0.477 0.477 0.477

CONCLUSION

In sentence 1, priority was given to stress and treat as compared to other words.

In sentence 2, priority was given to NLP, create and make as compared to other words.

In sentence 3, priority was given to replace, counselor and human as compared to other words.

-O.Varshitha
10C

You might also like