Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

DATA SCIENCE LAB

Mini Project Report


Topic: Text Summarization
Name: Vemula Yaminee Jyothsna
Roll no: 20BM6JP44

Introduction:
Various sectors nowadays struggle to know the customer feedback to be in online shopping,
government sector, private organizations, tourism, etc. to improve their services/product.
These companies receive huge data every single day which is not in a structured format. It
will be a very hectic job for the organizations to evaluate the reviews/ feedback of every
customer to satisfy their needs.
Machine learning which is a booming field can help us understand the human language and
summarize the content in such a way that it highlights the important information of the
large text provided. This can be achieved by using one of the filed of machine learning
knows as Natural Language Processing abbreviated as NLP.

Applications of Text summarization:

• Newsletters
• Media monitoring
• Understanding customer satisfaction in the form of reviews
• Social media monitoring
• Video scripting
• Helping disabled people
• Help desk and customer reviews.

This text summarization can be achieved by two different approaches:

• Extractive summarization
• Abstractive summarization.
Extractive summarization:
In this approach, we identify the important sentences from the original text and extract
them to provide the summary of the original text.

Abstractive summarization:
In this approach, we work to generate new sentences from the original text and form the
summary. The sentences from the summary might not be in the original text. This method is
in contrast to extractive summarization.

Objective:
To generate a summary for each Amazon fine food customer reviews available at
https://www.kaggle.com/snap/amazon-fine-food-reviews

Dataset Description:
Amazon fine food reviews dataset is used which is available in Kaggle. This dataset has 10 columns
which constitute ProductId, UserId, ProfileName, HelpfulnessNumerator, Helpfulnessdenominatir,
Score, Time, Summary, Text. Here Text column contains the detailed reviews by the customer in the
form of a paragraph. The total number of products described is 56845.

For this project, we consider only the Text column and Summary column for validating the results
from our model.
Dataset is as follows:

Algorithms Used:
Two Algorithms have been used for extractive summarization. They are:

• Word frequency
• Term frequency-inverse document frequency (tf-idf)
Word-frequency:
In this algorithm, we calculate the sentence score and summarize based on the highest
sentence scored available. The steps are as follows:
Step-1: Tokenize the text data.
Step-2: Removing stop words and storing them.
Step-3: Creating a frequency table with each word score.
Step-4: Calculating the score of each sentence from the frequency table.
Step-5: Calculating average sentence score from the text.

Term frequency - inverse document frequency(tf-idf):


TFIDF, short for term frequency-inverse document frequency, is a numeric measure that is
used to score the importance of a word in a document based on how often did it appear in
that document and a given collection of documents. The intuition behind this measure is: If
a word appears frequently in a document, then it should be important and we should give
that word a high score. But if a word appears in too many other documents, it’s probably
not a unique identifier, therefore we should assign a lower score to that word.
The formula for calculating TF and IDF values:
TF(w) = (Number of times word w appears in a text/document) / (Total number of words in
the Text/document)
IDF(w) = loge(Total number of texts/documents / Number of texts/documents with word w
in it)
TFIDF(w) = TF(w) * IDF(w)

Evaluation measure:
For Text summarization in this project, we have used rouge-score as the evaluation
parameter.
Rouge stands for Recall Oriented Understudy for Gist Evaluation. It is a set of metrics for
evaluating the summarized text generated by algorithms with original summary text.
The three metrics this rouge score has are:

• Precision
• Recall
• F-measure
Precision and recall in terms of Rouge are as follows:
Precision = number of overlapping words/ Total number of words in the generated summary
Recall = number of overlapping words/ Total number of words in the original summary
F-measure = 2[(precision * Recall) / (Precision + Recall)]

Results from the implemented code:

Word Frequency algorithm


Code for calculating the sentence score:
A few of the output results are as follows:
Text-1:
The product arrived labeled as Jumbo Salted Peanuts...the peanuts were small-sized
unsalted. Not sure if this was an error or if the vendor intended to represent the product as
"Jumbo".
Summary:
The product arrived labeled as Jumbo Salted Peanuts...the peanuts were small-sized
unsalted.
Text-2:
This oatmeal is not good. It's mushy, soft, I don't like it. Quaker Oats is the way to go.
Summary:
It's mushy, soft, I don't like it.
Evaluation scores are as follows:
Here Rouge-1 is taken into account that is taking into granularity of each word into
consideration.

METRICS AVERAGE VALUE


Precision 0.82735
Recall 0.4448919158517687
F- measure 0.5641378794872381

Term frequency - inverse document frequency(tf-idf):

Code for calculating the tf-idf score of each sentence:


A few of the output results are as follows:
Text-1:
This peppermint stick is delicious and fun to eat. My dad got me one for Christmas because
he remembered me having a similar one when I was a little girl. I'm 30 now and I love it!
Summary:
My dad got me one for Christmas because he remembered me having a similar one when I
was a little girl.
Text-2:
I never in my life tasted such a good babka it's crazy good! This is the real babka! That my
gram mother use to make
Summary:
I never in my life tasted such a good babka it's crazy good!
Evaluation scores are as follows:
Here Rouge-1 is taken into account that is taking into granularity of each word into
consideration.

METRICS AVERAGE VALUE


Precision 0.9455
Recall 0.4601
F- measure 0.6016

Observations:
• It is observed that precision, recall, and f-measure scores have been improved for tf-
idf model compared to the word frequency model.
• The values for the precision, recall, and f-measure scores for tf-idf algorithm have
been obtained with a 50% information retrieval value. It can be improved by
increasing the information retrieval value.
• Cosine similarity between the generated summary and Summary present in the data
were are also calculated, but the score was very less in the range of 0.01 to 0.2. the
cause of this might be due to high dimensions in the generated summary. This could
be improved by performing PCA on the text to reduce the dimensionality of the
summary to be generated.
• The generated summaries would have been more explainable if the large text
documents were considered. Here reviews of the customers being very short
paragraphs there is a no much difference between the original review and summary
generated.
• BERT-based models might give more concise summaries, but could not be
implemented because of the installation problems.

You might also like