Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Sector 29, Pradhikaran, Akurdi, Pune - Maharashtra, INDIA 411044

(Establishment by Maharashtra Act No. LXIII of 2017)

School of Computer Science, Engg. & Applications

Synopsis

Krish J Shetty

PRN: 20190802129
Email Id: 20190802129@dypiu.ac.in
Phone Number: 8828489577

Vedant R Landge

PRN: 20190802005
Email Id: 20190802005@dypiu.ac.in
Phone Number: 8237789479

1|Page
TEXT SUMMARIZATION

ABSTRACT:

“We don’t need a full report. Just give me a summary of what you did”. This is something that most
people find themselves getting thrown at – both in college as well as in their professional life. They
slog for nights and prepare a comprehensive report only for the teachers/supervisors to read
nothing but the summary. Writing the summary manually, however, is an extremely tedious, labor-
intensive, and time-consuming task. This is where the awesome concept of Text Summarization
using Deep Learning comes into the picture.

Project reports being one of the fragments of the problems that people face, we can also agree on
the fact that the amount of textual data being produced every day is increasing rapidly. Social
Media, News articles, emails, etc., generate massive amounts of information, making it next to
impossible to go through all of them and gain knowledge. Propelled by modern technological
innovations, data is to this century what oil was to the previous one. As of today, our world is
functions by the gathering and dissemination of huge amounts of data. In fact, the International
Data Corporation (IDC) projects that the total amount of digital data circulating annually around the
world would sprout from 4.4 zettabytes in 2013 to hit 180 zettabytes in 2025. Now that is a lot of
data! However, it is possible to create models that shorten extremely long pieces of text and provide
us with a summary, thanks to Deep Learning, and save time as well as understand the key points
effectively.

Text summarization is a key natural language processing (NLP) task that automatically converts a
text, or a collection of texts within the same topic, into a concise summary that contains key
semantic information which can be beneficial for many downstream applications such as creating
news digests, search engine, and report generation. Text summarization is basically the problem of

2|Page
creating a short, accurate, and fluent summary of a longer text document. Automatic text
summarization methods are greatly needed to address the ever-growing amount of text data
available online to both better help discover relevant information and consume relevant information
faster. Machine learning models are usually trained to understand documents and distill useful
information before outputting the required summarized texts.

In this project, we will dive deep into the problem of text summarization in natural language
processing.

KEYWORDS:

1. Deep Learning
2. Natural Language Processing (NLP)
3. the International Data Corporation (IDC)

LITERATURE SURVEY:

i. Vishal Gupta and Gurpreet Singh:

“A Survey of Text Summarization Extractive techniques”. In this paper, the author describes the
extractive summarization methods which comprise two parts Pre-Processing and Processing. In
this paper, the pre-processing step is further divided into other subprocesses which are sentence
segmentation, stop word removal, and stemming. In the processing step, the weights are given
to the features used for extraction of summary from the large document respectively.

ii. Saranyamol C S and Sindhu L:

“A Survey on Automatic Text Summarization.” In this paper, the author describes the various
techniques used in automatic text summarization which are extractive text summarization and
abstractive text summarization.

3|Page
iii. Rafael Ferreira, Luciano de Souza:

Cabrala, Rafael Dueire Lins, Gabriel Pereira Silva, Fred Freitas, George D.C. Cavalcanti, Rinaldo
Lima a, Steven J. Simske, Luciano Favaro, "Assessing sentence scoring techniques for extractive
text summarization." This paper gives a brief description of various features used to perform
extractive summarization and it also describes the methods for summary evaluation.

iv. K. Vimal Kumar, Divakar Yadav:

“An Improvised Extractive Approach for Hindi Text Summarization.” This paper mainly laid
emphasis on the Hindi text summarization. It also describes various features used for Hindi
summarization using the extractive approach of text summarization. The author proposed a
system that can generate the summary with 85 % accuracy.

v. Vishal Gupta:

“Hybrid Algorithm for Multilingual Summarization of Hindi and Punjabi Documents.” The author
of this paper has proposed a hybrid algorithm for Hindi and Punjabi text summarization. The
algorithm proposed by the author is the first algorithm that can summarize both Hindi as well as
Punjabi text.

vi. Ani Nenkova:

“Summarization Evaluation for Text and Speech: Issues and Approaches.” This paper suggests
methods for summary evaluation after the process of text summarization. Also, it describes
some human models for summary evaluation.

vii. Inderjeet Mani “Summarization”:

Evaluation: An Overview.” The author describes various methods for evaluating summary

4|Page
PROBLEM DEFINITION & OBJECTIVES:

The aim of this project is to create a summary of a particular text. There are wide-scale applications
for this. Elderly people can get help with a talk-back system that summarizes the news, which they
ideally see in 3 hours, within minutes. We can even use this as a tool to help enhance the experience
of meetings. Most of the time, when people are attending a meeting, like in an office per se, people
tend to make messy keywords when their superiors ask them to jot down the key takeaways. Text
summarization can help people with understanding the key takeaways and never missing on them.
Another field of application for text summarization is to help people understand what a person is
trying to say in a completely different language. This can even help us bridge the language barrier.

METHODOLOGY:

1. LIBRARIES:

Before we start with the project, we need to first import all the libraries and dependencies.
There are five libraries that we have thought of using as of now.

First among them is ‘Pandas’. Pandas has been one of the most commonly used tools for Data
Science and Machine learning, which is used for data cleaning and analysis. Here, Pandas is the
best tool for handling this real-world messy data. And pandas is one of the open-source python
packages built on top of NumPy.

Then we import ‘NumPy’. The TensorFlow platform helps you implement best practices for data
automation, model tracking, performance monitoring, and model retraining. Using production-
level tools to automate and track model training over the lifetime of a product, service, or
business process is critical to success.

There is a Python module named ‘time’. It provides many ways of representing time in code,
such as objects, numbers, and strings. It also provides functionality other than representing
time, like waiting during code execution and measuring the efficiency of your code.

The next library we import is ‘re’. A regular expression (or RE) specifies a set of strings that
matches it; the functions in this module let you check if a particular string matches a given
regular expression (or if a given regular expression matches a particular string, which comes
down to the same thing).

5|Page
Finally, we use ‘Pickle’. Pickle in Python is primarily used in serializing and deserializing a Python
object structure. In other words, it's the process of converting a Python object into a byte stream
to store it in a file/database, maintain program state across sessions or transport data over the
network.

These are the tentative libraries that we have mentioned. We might as well use more even
different libraries if we feel the need during the course of our project.

2. DATA:

After we get our libraries straight, we straightaway head to load the data into our project. We
will be using the Pandas data frame for this purpose. Once the data is loaded into our project,
we must then understand the data. Now, the dataset contains data collected from Inshorts and
consists of five columns. Out of them, we will only be needing two of them, namely, Headline
and Short. The rest three are of no use to us for the purpose of this project. Hence, we drop
them.

The data will then start to look something like this:

Once we are done with this, we get a dataset of 2 columns and 55104 rows. We use “.shape” for
that.

3. Pre-processing:

Now, this is a very important step. Before moving on to building the actual model, it is extremely
crucial to perform pre-processing on the data. Using messy and uncleaned data can prove to be

6|Page
disastrous. Hence, we need to drop all the unwanted symbols, characters, etc. from the text
such that they do not adversely affect the objective of our project.

Following are the pre-processing tasks we will be performing:

 Convert everything to lowercase


 Remove (‘s)
 Remove any text inside the parenthesis () x
 Eliminate punctuation and special characters
 Remove stop words
 Remove short words

4. METHOD:

The method, that we will be using here, would be the ‘Abstractive Method’. This method will be
implemented using Deep Learning

5. MODEL:

In languages, what really matters is the order of the words along with their position in a
particular sentence. Re-order the words, and the entire meaning/context of the sentence
changes completely. Hence, Recurrent Neural Networks (RNNs) use mechanisms that deal with
such sorts of orders of sequences. These mechanisms are inbuilt. The transformer model, on the
other hand, does not use recurrence or convolution. That is why it is able to treat each data
point as independent of the other. Therefore, positional information needs to be added to the
models unequivocally in order to retain the information regarding the order of words in a
sentence and for this purpose, we use a concept called ‘Positional Encoding’.

The network that we will be using here to train the model is FFN. Feed Forward Neural Network
is a very popular model in machine learning. It can approximate general functions and mitigate
the curse of dimensionality. Feed-forward neural networks allow signals to travel one approach
only, from input to output. There is no feedback (loops) such as the output of some layer does
not influence that same layer. Feed-forward networks tend to be simple networks that associate
inputs with outputs. It can be used in pattern recognition.

7|Page
ARCHITECTURE:
There is a research paper “Attention Is All You Need”. This paper introduces a novel architecture,
namely ‘Transformer’. This new model uses the attention mechanism. Similar to LSTM, we use it to
transform one sequence to another by following the encoder-decoder structure.

The transformer model does not need to rely on recurrence or convolution for generating outputs.

As mentioned earlier, the transformer model works on two concepts: Encoder and Decoder.

8|Page
1. Encoder:

The encoder is placed on the left side of the architecture. It consists of a stack of N = 6 layers
that are identical. In this, each one of the layers is composed of two sublayers.

a. The first one of them implements a multi-headed self-attention mechanism. In this,


h number of heads that receive a linearly projected version of the queries, keys, and
values, are implemented, each, which gives us h number of outputs in parallel.
These are then used to generate a final result.

9|Page
b. The second sublayer is a fully connected Feed-Forward Neural Network. It has two
linear transformations along with a RELU in between.

All the N = 6 layers apply the same linear transformations to all the words in an input
sequence. However, during this process, each one of the layers will employ different weights
and biases. Moreover, both the sublayers have a residual connection as well.

Each of these sublayers is then succeeded by a normalization layer. This normalizes the sum
that gets computed between the sublayer input and output generated by that particular
sublayer.

2. Decoder:

The decoder can be found on the right side of the architecture. It shares a lot of similarities
with the encoder.

Decoders, as well, consist of N = 6 layers just like the encoders. In this case, however, there
are three sublayers unlike the two in encoders:

a. The first one receives the previous output of the stack (of the decoders). Then, it
augments the output with positional information and proceeds to implement multi-
head self-attention over it. Unlike the encoder which is designed to attend to all
words in the input sequence irrespective of their position in that particular
sequence, the decoder only attends to the preceding words. Hence, the prediction
for a specific word at its respective position can only depend on the known outputs
for the words that come before it in the sequence. In the multi-headed attention
mechanism, this is achieved by introducing a mask over the values produced by the
scaled multiplication of matrices Q and K. This masking makes the decoder
unidirectional (unlike the bidirectional encoder).
b. The second layer implements the same multi-headed self-attention mechanism as
the one implemented in the first sublayer of the encoder. In this part, however, the
mechanism receives the queries from the previous decoder sublayer as well as the
keys and values from the output of the encoder, allowing the decoder to attend to
all the words in the input sequence.
c. In the third layer, a fully connected feed-forward network is implemented, just like
the second sublayer of the encoder.

Again, the decoders have residual connections and are succeeded by normalization layers.
Like its said, there are a lot of similarities when you compare decoders and encoders.

10 | P a g e
ALGORITHM:
The Transformer model runs as follows:

1. Each word forming an input sequence is transformed into a dmodel-dimensional embedding


vector.
2. Each embedding vector representing an input word is augmented by summing it (element-
wise) to a positional encoding vector of the same dmodel length, hence introducing positional
information into the input.
3. The augmented embedding vectors are fed into the encoder block consisting of the two
sublayers explained above. Since the encoder attends to all words in the input sequence,
irrespective if they precede or succeed the word under consideration, then the Transformer
encoder is bidirectional.
4. The decoder receives as input its own predicted output word at time-step.
5. The input to the decoder is also augmented by positional encoding in the same manner done
on the encoder side.
6. The augmented decoder input is fed into the three sublayers comprising the decoder block
explained above. Masking is applied in the first sublayer in order to stop the decoder from
attending to the succeeding words. At the second sublayer, the decoder also receives the
output of the encoder, which now allows the decoder to attend to all the words in the input
sequence.
7. The output of the decoder finally passes through a fully connected layer, followed by a
SoftMax layer, to generate a prediction for the next word of the output sequence.

DATASET:
In this project, we have used the Inshorts dataset. Inshorts is a platform that collects news
from a wide range of sources and publishes them as a summary. Hence the name “in short”. Their
tagline literally says, “Stay informed in 60 words”.

For training, we will be using the headline as the summary of a particular news article with
the original article being the context. Now since we are aiming to summarize the text, the data like
source, time, and date hold no significance to our purpose. That is why we have decided to remove
these three columns from the dataset and work on the columns “Headline” and “Short”.

The dataset has close to fifty-five thousand samples of news-headlines pairs with roughly
around 76000 unique words in the news data and almost 29000 in the headlines. The sample with
the longest sequence of news has around 469 words. The length of news is observed to be averaging
around 368 words while that of headlines is averaging around 96 words.

11 | P a g e
REFERENCE:

https://medium.com/inside-machine-learning/what-is-a-transformer-d07dd1fbec04

https://www.dominodatalab.com/blog/transformers-self-attention-to-the-rescue

https://machinelearningmastery.com/gentle-introduction-text-
summarization/#:~:text=Summaries%20reduce%20reading%20time.,less%20biased%20than%20hu
man%20summarizers.

https://www.analyticsvidhya.com/blog/2019/06/comprehensive-guide-text-summarization-using-
deep-learning-python/

https://towardsdatascience.com/a-quick-introduction-to-text-summarization-in-machine-learning-
3d27ccf18a9f

https://data-flair.training/blogs/machine-learning-text-summarization/

12 | P a g e

You might also like