Professional Documents
Culture Documents
Text Prediction Analysis
Text Prediction Analysis
Contents
● Problem Statement
● Motivation
● Implementation
● Data Preprocessing
● Techniques/Algorithms used
● References
Problem Statement
To help user write better while maintaining the context of the statement and also making it look professional.
Motivation
● A lot of youngsters and teenagers now-a-days use most of the time completing their incomplete mails,
essays, assignments, projects etc.
● This project could help reduce time required by these tasks by providing recommendations for the text
while making it more professional and hence providing a unique touch to the user’s text.
Preprocessing
Stemming/
Named Entity Recognition Tokenizer
Lemmatization
Recognizes the parts of speech of the Word tokenizer splits the
These processes will return the split words for better prediction sentences into relevant words
root word of the words in the list
E.g: Consider the statement, “The monkeys are eating bananas on the tree!”,
The output will return an array of the words in the input statement, [“The”,
“monkeys”,”are”,”eating”,”bananas”,”on”,”the”,”tree”,”!”].
Named Entity Recognition(NER)
After splitting done by Tokenization, NER is applied. NER identifies the entity of every word sent to it.
and so on
Stemming and Lemmatization
Stemming and Lemmatization return a word to its simpler root form. Both stemming and lemmatization are
similar to each other but the results are a bit different.
TD-IDF creates a data frame with features of tokenized words(similar to Bag of Words(BoW)), but it tries to
scale up the reare terms and scale down the frequent terms.
E.g: Consider the statement, “The monkeys are small. The ducks are also small”,
Here, the word “the” and “are” does have a frequency of 2 and thus scales down such words while scaling
up the words with lower frequency like “monkeys” or “ducks”.
Sentiment Analysis
Sentiment Analysis can be used for selecting the word which fits best with the context of the statement. It can be
run by using TextBlob or training a Machine Learning model. TextBlob does not require training. It can tell the
polarity and subjectivity of the reviews which ranges from 1 to -1 expressing positive to negative sentiment.
References
● analyticsvidhya.com/blog/2021/06/must-known-techniques-for-text-preprocessing-in-nlp/
● E. Chan, J. Ginsburg, B. Ten Eyck, J. Rozenblit and M. Dameron, "Text analysis and entity extraction in asymmetric threat response and
prediction," 2010 IEEE International Conference on Intelligence and Security Informatics, 2010, pp. 202-207, doi:
10.1109/ISI.2010.5484737.
● E. Chan, J. Ginsburg, B. Ten Eyck, J. Rozenblit and M. Dameron, "Text analysis and entity extraction in asymmetric threat response and
prediction," 2010 IEEE International Conference on Intelligence and Security Informatics, 2010, pp. 202-207, doi:
10.1109/ISI.2010.5484737.
● C. -z. Liu, Y. -x. Sheng, Z. -q. Wei and Y. -Q. Yang, "Research of Text Classification Based on Improved TF-IDF Algorithm," 2018 IEEE
International Conference of Intelligent Robotic and Control Engineering (IRCE), 2018, pp. 218-222, doi: 10.1109/IRCE.2018.8492945.