Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Abstractive Text Summary Generation with

Knowledge Graph Representation

Prottay Kumar Adhikary, Prachurya Nath, Mrinmoi Borah, Pankaj Dadure,


and Partha Pakray

Department of Computer Science & Engineering


National Institute of Technology Silchar, India
{prottay71, prachuryanath00, krdadure, parthapakray}@gmail.com

Abstract. With the enormous expansion of blogs, news stories, and re-
ports, the task of extracting usable information from this vast number of
textual documents is a laborious one. For summarising these documents,
automatic text summarization is an effective solution. The goal of text
summarization is to compress long documents into brief summaries while
retaining important information and meaning. Many interesting summa-
rization models have been proposed to handle the challenges, such as
saliency, fluency, human readability, and generating high-quality sum-
maries. In this work, we have presented the Text-To-Text Transfer Trans-
former model for the task of abstractive summarization with knowledge
representation. The experimental results showed that the Text-to-Text
Transfer model produces more conceptual, comprehensible, and abstrac-
tive summaries. To evaluate the quality of the generated summaries, the
ROUGE and BLEU score has been taken into consideration.

Keywords: Text Summarization · Abstractive Summary · News Docu-


ments · Transformer · Knowledge Graph

1 Introduction

Text summarization [6] is one of the well-known applications of Natural Lan-


guage Processing (NLP) and has come under the limelight since a few decades.
In the advancement and subsequent effect of the digital world on the burgeoning
publication sector, time to sincerely read an article, document or book is no more
feasible, particularly in view of the lack of time. Furthermore, as the number of
articles published and traditional print publications is digitalized, it has become
pretty hard to keep track of increasing web publications. In this case, a sum-
mary of the text can serve to minimize the complexity of the texts. Generally,
text summarization is the process by which a brief, consistent, and proficient
summary of a lengthy text article is created and highlights their major ideas
[4]. Basically, the text summarization task has been categorized into two cate-
gory [2]: Extractive Summarization and Abstractive Summarization. Extractive
summarization summarises the documents by identifying the key phrases in the
original text and combining them to produce a shorter version of that document.
2 Authors Suppressed Due to Excessive Length

On the other hand, abstractive summarization is based on the possibility of para-


phrasing and shortening parts of a document using advanced natural language
techniques. Even though abstractive summarization algorithms can produce new
sentences and phrases to capture the significance of the source text. When this
abstraction is correctly performed in deep learning, it may help to overcome
grammatical imprecision.
In the late 1950s, automatic summarization research began to attract the
attention of the scientific community [12] when an interest in the automation of
summary production of the technical documentary was highlighted. The interest
in the area decreased for a few years until the topic was first discussed by Arti-
ficial Intelligence [5]. For many years, it has been assumed that summarization
should understand the input text and explicit (semantic) interpretation of the
text should be calculated to identify its essential content. The summary of the
text thus became an interesting application for testing artificial systems’ under-
standing capabilities. However, due to the complexity of the task, the interest
in this method of summarising texts decreased rapidly and text understanding
itself became an open research area.
In this paper, we have investigated the Text-To-Text Transfer Transformer
model for abstractive summarization. The experimental results showed that the
Text-to-Text Transfer approach produced more conceptual, understandable, and
abstractive summaries. The paper is structured as follows: Section 2 describes
the prior work related to text summarization. Section 3 gives a detailed account
of the dataset. Section 4 provides detailed description about the system architec-
ture. Section 5 describes the experimental setup and results. Section 6 concludes
with summary and directions of further research and developments.

2 Related Work

Since 1950, the text summarisation has been investigated and most of the re-
search focused to extractive summarisation by analysing the structure of the
words in the document [14]. Recently, recurrent neural network (RNN) has shown
strong influenced on Image recognition, machine translation and speech recog-
nition. Motivated from this, baotian et al., [7] have introduced the Large-Scale
Chinese Short Text Summarization (LCSTS) dataset and used the RNN-based
method for the task of text summarization. The LCSTS contains almost 2 mil-
lion texts in Chinese with their summary. The LCSTS dataset also includes the
relevance of the 10,666 texts with their corresponding source summary.
Initially, the attentional encoder-decoder RNN model has shown remark-
able performance for the task of machine translation. Moreover, Ramesh et at.,
[13] has deployed this off-the-shelf attentional encoder-decoder RNN model for
the task of text summarization on two different datasets (DUC Corpus and
CNN/Daily Mail Corpus) and achieved the state-of-the-art performance. In ad-
dition to this, a new dataset has been proposed for multisentence summarization.
The generative adversarial network has one of the recognized deep learning mod-
els for the task of text generation. Linqing et al. [11] use the generator as an
Title Suppressed Due to Excessive Length 3

agent of reinforcement learning, which takes the text as input and generates
the abstractive summaries. Besides, a discriminator has been developed to help
separate the generated summary from the ground truth summary. The guiding
generation model has combined the capability of abstractive and extractive sum-
marization methods [9]. Herein, the extractive summarization method has been
used to attains the keywords from the text. Afterwards, a Key Information Guide
Network (KIGN) encodes the keywords to the key information which guided the
summary generation process. The proposed method, unlike prior studies that
only used a single encoder, uses a dual encoder, namely primary and secondary
encoders [17]. The primary encoder, in particular, performs conventional coarse
encoding, whilst the secondary encoder models the relevance of words and pro-
vides finer encoding depending on the input raw text and previously generated
output text summarization. The two-level encodings are merged and sent into
the decoder to provide a more diversified summary, which can reduce the recur-
rence of the long sequence generation. A multi-head attention summarization
(MHAS) model [10] that learns important information in distinct representation
subspaces using a multi-head attention mechanism. To prevent the generated
summary from duplicate repetition of the terms, the MHAS model accounts the
previously predicted words while generating new words. It can also learn the
article’s underlying structure by adding a self-attention layer to the typical en-
coder and decoder, and allowing the model to effectively maintain the original
data. To increase the model’s performance, multi-head attention distribution has
been integrated to the pointer network.

The sequence-to-sequence framework employs a keyword-aware encoder (KAE)


[8], which enriches the RNN-based encoder. It combines keyword information
into word representations and distils important information using keywords. Ex-
periments on data sets in a variety of languages and lengths reveal that our KAE
model improves performance significantly and is competitive to the most recent
state-of-the-art methods. The COSUM approach [1] is the combination of the
clustering and optimization techniques in order to identify the topic by using k-
means algorithms and optimisation model to select the main sentences presents
in the defined clusters. The optimization aims to define the harmonic mean im-
posed in the summary by covering and diversifying the chosen phrases. This
model also monitors the length of the selected sentences in the candidate sum-
mary to ensure that a summary is readable. The key part of the extraction-based
summarization is the identification of the important sentences of the text. The
fuzzy logic based text summarization approach [3] improved the quality of sum-
mary by incorporating in the sentence feature latent semantic and also attaining
the semantic connections between concepts extracted from the original text.
A classification-summarization approach [16] categories the tweets into various
classes, and then summarization has been performed. Basically, it’s two-stage
extractive-abstractive summarization framework. At the initial step, it attains
a number of important tweets from a wide range of information and develops a
bigram-based text graph. Afterward, it uses an optimization technique to select
the most significant tweets and paths based on various parameters, such as in-
4 Authors Suppressed Due to Excessive Length

formation and content coverage. In addition to the customized summarization


model, it also addresses the critical time sparse information needs.

3 Dataset Description
CNN/Daily Mail dataset from Hugginface has been used to test, train and vali-
date the proposed model. Each article has an average of 28 sentences. Preparing
the CNN/Daily Mail dataset for the summarization task is one of our attempted
contributions. Mainly, news articles and highlight sentences form the dataset.
The highlight sentences are concatenated in the summarization setting to gen-
erate an article summary. We adopted parallel summarization in this task using
this dataset. For training, 2,86,817 documents have been used, 13,368 documents
have been used for validation, and 11,490 documents have been used for testing.
The metadata of the used dataset in shown in Table 1.

Table 1. Dataset Description

Data Length of data


Training 2,86,817
Validation 13,368
Testing 11,490

4 System Architecture
The proposed approach constitutes two important modules, i.e., abstractive sum-
mary generation and knowledge graph representation. The summary generation
module generates the summary, whereas the knowledge graph representation
module represents the key concept of the document in the form of nodes and
edges where a node represents the subject & objects and edges represents the
verbs. The workflow of the proposed approach is depicted in Figure 1 where each
module works sequentially.

Fig. 1. Workflow of the Proposed Approach


Title Suppressed Due to Excessive Length 5

4.1 Summary Generation


The designed text summarization approach has utilized the potential of Text-to-
Text Transfer Transformer (T5) architecture [15]. T5 is a text-to-text encoder-
decoder model that has been pre-trained on a mixture of unsupervised and
supervised Tasks. Every task is framed as feeding the text as input and training
it to output some goal text.
Training: T5 is an encoder-decoder paradigm that translates all NLP tasks
to text-to-text sequence. T5 architecture forced to learn by a teacher. This means
that to accomplished particular task, the model always need an input sequence
and a goal sequence for training. Using input ids, the model receives the input
sequence. The target sequence is prepended with a start-sequence token and sent
to the decoder using the decoder input ids. The target sequence is subsequently
appended by the EOS token and corresponds to the labels in the teacher-forcing
style. The PAD token is used as the start-sequence token. In the designed text
summarization task, the input and output sequences are a standard sequence-
to-sequence input-output mapping where input is the long sequence of the news
document context and output is the short sequence of the news document sum-
mary. In general only two fields are required in order to train the model: input
ids (the encoded input sequence’s) and labels (the encoded target sequence’s).
Moreover, based on the labels, the model generates the decoders input ids.

4.2 Knowledge Graph Representation


The prime aim of this module is to visualized the information depicted in the
news articles. To generate the graph, the spaCy python library (largest English
corpus model (en core web lg)) has been used which extracts the key details of
the news information depicted in the news articles. Afterwards, the networkx
package has been deployed to frame the graphs network of nodes and edges.
Herein, the nodes are primarily the subjects and objects of sentence of the news
articles, and they are connected by edges which represents the verbs of the same
sentence. Finally, matplotlib.pyplot has been used to visualized the graph. Figure
2 provides the pictorial representation of the generated knowledge graph.

5 Experimental Results
The proposed approach has been tested on the CNN/Daily Mail dataset which
consist news articles and their corresponding summary. Moreover, the efficiency
of the proposed approach is measured in terms of ROUGE-1, ROUGE-2, ROUGE-
L, and BLEU. All these parameters are calculated for each test document and
then the average is taken for all the documents present in test set. To obtained
this ROUGE score, the semantic comparison has been performed between source
and predicted summary. The proposed approach delivers the noticeable results
value for the task abstractive text summarization. The experimental results value
has been compared with the state-of-the-art Sequence-to-Sequence RNN ap-
proach [13] where proposed approach better results value. The Table 2 shows
6 Authors Suppressed Due to Excessive Length

Fig. 2. Knowledge Graph


Title Suppressed Due to Excessive Length 7

the comparative analysis of the obtained results value. In figure 3, we have shown
the comparative analysis of the summary generated by the proposed system and
the source summary where evaluation score 3 depicts that the approach generates
the semantically correct summary without loosing any important information,
score 2 depicts that the approach generates the partially similar summary to
source similar, and score 1 depicts that the generates summary is not similar to
source similar.

Table 2. Experimental Results value

Metric Sequence-to-Sequence RNN [13] Proposed Approach


ROUGE-1 35.46 40.79
ROUGE-2 13.30 18.55
ROUGE-L 32.62 34.8
BLEU NA 43.5

Fig. 3. Comparative analysis of the source summary and generated summary


8 Authors Suppressed Due to Excessive Length

• The obtained summaries depict that the proposed T5 based summarization


approach can understand the long coherent semantic meaning of the large
context.
• The obtained results of the proposed approach have been compared with the
existing Sequence-to-Sequence RNN approach, where the proposed approach
achieved better results value.
• The designed system can also map the relationships between sentences in a
paragraph and generate meaningful summaries based on the requirements
on texts.
• The documents contain in CNN/DM dataset are written in formal English.
As a result, the model sometimes struggle to generate summary for the
documents written in informal English.
• The knowledge graph provide visualization to textual information and high-
lights the key-term occurred in the documents.

6 Conclusions and Future Scope


In this work, we have presented the Text-To-Text Transfer Transformer model for
the task of abstractive summarization with very promising results, outperforming
state-of-the-art results significantly on CNN/Daily Mail dataset. In addition to
this, we have also incorporates the knowledge graph to visualized the key concept
depicted in the documents. The experimental results showed that the Text-to-
Text Transfer model produces more conceptual, comprehensible and abstractive
summaries. To evaluate the quality of the generated summaries, ROUGE score
taken into consideration. The experimental results achieved the highest values
for ROUGE-1, ROUGE-2, ROUGE-L, and BLEU (40.79, 18.55, and 34.8, and
43.5 respectively). In the future, our study will focus on improving the model’s
accuracy by training it with new datasets from various areas. To improve the
visual representation of the knowledge graph, nodes based on importance and
frequency might be constructed, which would help in the display of long para-
graphs.

Acknowledgement
The work presented here falls under the Research Project Grant No. IFC/4130/DST-
CNRS/2018-19/IT25 (DST-CNRS targeted program). The authors would like to
express gratitude to the Centre for Natural Language Processing and Artificial
Intelligence Lab, Department of Computer Science and Engineering, National
Institute of Technology Silchar, India for providing infrastructural facilities and
support.

References
1. Alguliyev, R.M., Aliguliyev, R.M., Isazade, N.R., Abdi, A., Idris, N.: Cosum:
Text summarization based on clustering and optimization. Expert Systems 36(1),
e12340 (2019)
Title Suppressed Due to Excessive Length 9

2. Allahyari, M., Pouriyeh, S., Assefi, M., Safaei, S., Trippe, E.D., Gutierrez,
J.B., Kochut, K.: Text summarization techniques: a brief survey. arXiv preprint
arXiv:1707.02268 (2017)
3. Babar, S., Patil, P.D.: Improving performance of text summarization. Procedia
Computer Science 46, 354–363 (2015)
4. Day, M.Y., Chen, C.Y.: Artificial intelligence for automatic text summarization. In:
2018 IEEE International Conference on Information Reuse and Integration (IRI).
pp. 478–484. IEEE (2018)
5. DeJong, G.: An overview of the frump system. Strategies for natural language
processing 113, 149–176 (1982)
6. Fattah, M.A., Ren, F.: Automatic text summarization. World Academy of Science,
Engineering and Technology 37(2), 192 (2008)
7. Hu, B., Chen, Q., Zhu, F.: Lcsts: A large scale chinese short text summarization
dataset. arXiv preprint arXiv:1506.05865 (2015)
8. Hu, T., Liang, J., Ye, W., Zhang, S.: Keyword-aware encoder for abstractive text
summarization. In: International Conference on Database Systems for Advanced
Applications. pp. 37–52. Springer (2021)
9. Li, C., Xu, W., Li, S., Gao, S.: Guiding generation for abstractive text summariza-
tion based on key information guide network. In: Proceedings of the 2018 Confer-
ence of the North American Chapter of the Association for Computational Linguis-
tics: Human Language Technologies, Volume 2 (Short Papers). pp. 55–60 (2018)
10. Li, J., Zhang, C., Chen, X., Cao, Y., Liao, P., Zhang, P.: Abstractive text sum-
marization with multi-head attention. In: 2019 international joint conference on
neural networks (ijcnn). pp. 1–8. IEEE (2019)
11. Liu, L., Lu, Y., Yang, M., Qu, Q., Zhu, J., Li, H.: Generative adversarial network
for abstractive text summarization. In: Thirty-second AAAI conference on artificial
intelligence (2018)
12. Luhn, H.P.: The automatic creation of literature abstracts. IBM Journal of research
and development 2(2), 159–165 (1958)
13. Nallapati, R., Zhou, B., Gulcehre, C., Xiang, B., et al.: Abstractive text summariza-
tion using sequence-to-sequence rnns and beyond. arXiv preprint arXiv:1602.06023
(2016)
14. Nenkova, A., McKeown, K.: Automatic summarization. Now Publishers Inc (2011)
15. Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li,
W., Liu, P.J.: Exploring the limits of transfer learning with a unified text-to-text
transformer. arXiv preprint arXiv:1910.10683 (2019)
16. Rudra, K., Goyal, P., Ganguly, N., Imran, M., Mitra, P.: Summarizing situational
tweets in crisis scenarios: An extractive-abstractive approach. IEEE Transactions
on Computational Social Systems 6(5), 981–993 (2019)
17. Yao, K., Zhang, L., Du, D., Luo, T., Tao, L., Wu, Y.: Dual encoding for abstractive
text summarization. IEEE transactions on cybernetics 50(3), 985–996 (2018)

You might also like