Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

IETE Technical Review

ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/titr20

Studying the Effect of Syntactic Simplification on


Text Summarization

Niladri Chatterjee & Raksha Agarwal

To cite this article: Niladri Chatterjee & Raksha Agarwal (2023) Studying the Effect of
Syntactic Simplification on Text Summarization, IETE Technical Review, 40:2, 155-166, DOI:
10.1080/02564602.2022.2055670

To link to this article: https://doi.org/10.1080/02564602.2022.2055670

Published online: 31 Mar 2022.

Submit your article to this journal

Article views: 227

View related articles

View Crossmark data

Citing articles: 1 View citing articles

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=titr20
IETE TECHNICAL REVIEW
2023, VOL. 40, NO. 2, 155–166
https://doi.org/10.1080/02564602.2022.2055670

Studying the Effect of Syntactic Simplification on Text Summarization


Niladri Chatterjee and Raksha Agarwal
Department of Mathematics, Indian Institute of Technology Delhi, New Delhi 110 016, India

ABSTRACT KEYWORDS
The need for automatic text summarization (ATS) is increased manifold in recent times due to the Abstractive summary;
overwhelming growth of textual data available in electronic form. However, existing ATS systems Relative clause; ROUGE
suffer from two major shortcomings. Summarizers of extractive type, that is, the ones which select metric; Syntactic
important sentences of the documents in their original form as the output, tend to copy some irrel- simplification; Text
processing; Text
evant or unimportant parts of the input text in the output summary. On the other hand, abstractive simplification; Text
summarizers, that is, the ones that produce a gist of the limited size of the original document, often summarization
fail to include important contents in the generated summary. Simplification of the input texts before
submitting them to the ATS system(s) may obliterate the above difficulties. The present work exam-
ines the effectiveness of simplification of input for five different known ATS systems. In this work,
DEPSYM++ simplifier has been used for the above purpose, which carries out four different kinds of
simplification on sentences of the input text corresponding to the presence of appositive clause, rela-
tive clause, conjoint clause, and passive voice. The results obtained are found to be very encouraging
when experiments were carried out on three different gold data sets and under different evaluation
metrics commonly used for performance evaluation for summarizers.

1. INTRODUCTION
The above motivation emerges from the observation of
Automatic text summarization (ATS) aims at providing the following research gap with respect to abstractive
a concise and crisp overview containing the important summarization. Abstractive summarization through sen-
information of a given text. ATS systems can be classified tence compression is being practiced for more than two
as extractive and abstractive [1]. Extractive summarizers decades now [8,9], their performance is often poor. In
rank the input sentences on the basis of their impor- compression-based abstractive summarization, parts of
tance using different techniques, such as occurrence of an input sentence are deleted on the basis of its syntactic
frequent words [2], graphical representation [3,4], matrix structure and their statistical properties [10]. However,
factorization [5,6], fuzzy sentence similarity [7]. The top- the decision regarding which parts of the sentence are
ranking sentences are then selected to constitute the sum- to be deleted often ignores the information contained in
mary. The major limitation of the above approach is that them [11]. As a consequence, the sentence compression
it may result in summaries of low quality, particularly process may end up pruning parts of the sentence with
when the input text contains long sentences having many higher information content. As a remedy to this problem,
different concepts. Presenting them in toto in summary in the present work, syntactic simplification is performed
leads to the inclusion of unimportant contents in sum- on the sentences of the given text before performing
mary. On the other hand, in abstractive summarization, summarization.
the input text is typically modified or rewritten to convey
the information more meaningfully and succinctly. In syntactic simplification, the structure of a sentence is
modified by rearranging the words and/or splitting a sen-
The aim of the current paper is to improve the quality of tence into two or more shorter sentences [12]. The inher-
summarization, both extractive and abstractive, by first ent purpose is to reduce the structural complexity of a
simplifying the input text at the preprocessing stage and given sentence while preserving its meaning. The present
then applying the summarization technique to the sim- work uses the syntactic simplification system DEPSYM
plified text. More, specifically, here we try to investigate [13] for the above purpose. However, in order to use it
the impact of prior simplification of the input text on more effectively for summarization purposes, we made
the quality of the summary generated through different some modifications to the original DEPSYM algorithm,
summarizers. which is termed DEPSYM++. DEPSYM++ splits the

© 2023 IETE
156 N. CHATTERJEE AND R. AGARWAL: STUDYING THE EFFECT OF SYNTACTIC SIMPLIFICATION ON TEXT SUMMARIZATION

Figure 1: Overview of the proposed approach

input sentence into multiple parts without deleting any reading comprehension [20] and listening comprehen-
information. The simplified version is then served as sion [21]. In this section, a review of existing works
input to different extractive and abstractive summariza- on application of text simplification for the task of text
tion systems. Figure 1 presents an overview of the pro- summarization has been presented.
posed approach. The purpose is to compare the perfor-
mance of different summarization systems on simplified The use of simplification techniques as a preprocess-
and original texts. Different metrics have been used to ing step for text summarization is being practiced for
evaluate their performance. nearly two decades. Siddharthan et al. [11] used syn-
tactic simplification rules for improving content selec-
The novelty of the proposed scheme lies not only in sug- tion in multi-document summarization. Here, relative
gesting syntactic simplification as a preprocessing step for clauses and appositive clauses of the input sentences
generating summaries of better quality but also in fine- are deleted to improve sentence clustering. Summary is
tuning the DEPSYM algorithm for modifying the raw generated by selecting at most one sentence from each
text suitably. Text simplification is a complicated NLP cluster.
task with various ramifications. Different types of simpli-
fication techniques have been used in literature for sum- Vanderwende et al. [22] used heuristic templates for the
marization, including end-to-end dual objective train- elimination of syntactic units from the input sentences
ing, as discussed in Section 2. However, despite being a based on parser output. Here, syntactic patterns, such
lightweight simplification system based only on depen- as noun appositives, gerundive clauses, nonrestrictive
dency parsing, experiments conducted for this research relative clause, and lead adverbials, were deleted from
show that the application of DEPSYM++ in the prepro- the input sentences. It was observed that although such
cessing step provides a significant boost to the perfor- deletions perform well on non-redundancy and content
mance of five different summarization systems as shown responsiveness, the grammatical quality and referential
in section 5. clarity of the summary were compromised.

The paper is organized as follows. Section 2 describes Silveira and Branco [19] used sentence simplification at
related works on application of text simplification for text the summary generation phase to produce highly infor-
summarization. Syntactic simplification algorithm used mative summaries. A double clustering approach based
in the present work is described in Section 3. Section 4 on sentence similarity and keyword similarity is used for
provides a description of different summarization algo- content selection. In order to reduce redundancy, only
rithms. Experimental details and data statistics are pre- the highest scoring sentence from each cluster is retained
sented in Section 5. Results corresponding to different at each step. After content selection, the sentences are
evaluation metrics are presented in Section 6. The paper simplified by removing dispensable phrases. These were
is concluded in Section 7. identified by isolating subtrees corresponding to five dif-
ferent syntactic structures from the constituency parse
tree of the sentences. However, as pointed out by the
2. RELATED WORK authors themselves that such a simplification system, if
applied before content selection, may affect the efficiency
The utility of text simplification has been demonstrated of double clustering by removing significant informative
in various NLP tasks, such as machine translation [14,15], parts of the sentence.
question answering [16], semantic role labeling [17,18],
and text summarization [11,19]. The effect of simplifi- Finegan-Dollak et al. [23] performed sentence simplifica-
cation has also been studied for the cognitive tasks of tion, compression, and disaggregation for summarization
N. CHATTERJEE AND R. AGARWAL: STUDYING THE EFFECT OF SYNTACTIC SIMPLIFICATION ON TEXT SUMMARIZATION 157

of texts from biomedical and legal domains. The sim- modifications performed on DEPSYM for simplification
plification system performs syntactic as well as lexical of relative and conjoint clause is described in Sections 3.1
simplifications. The disaggregation system splits an input and 3.2, respectively.
sentence into multiple sentences while maintaining the
relevant dependencies. The summaries for input docu-
ments with different levels of preprocessing were gener- 3.1 Relative Clause
ated using LexRank [4]. It was observed that summaries For sentences containing relative clauses, the output pro-
generated after applying simplification and disaggrega- vided by DEPSYM does not insert the noun phrase
tion performed better than compression with respect to within the relative clause, and the relative pronoun is also
comprehension score and extrinsic evaluation. retained in the output.

Zaman et al. [24] combined text sim plification and text For illustration, the sentence
summarization by extending the end-to-end pointer gen-
John teaches art in Sun City, where he goes to play golf
erator model. The authors used a combined loss function
consisting of likelihood loss, coverage loss, and simplifi- is simplified as a pair of sentences:
cation loss. The work suggests that inclusion of simpli- John teaches art in Sun City.
fication objectives in the summarization model yielded
better ROUGE scores for generation of summaries for Sun City is where he goes to play golf.
scholarly articles.
The following improvements have been incorporated
Vale et al. [25] studied the effect of different simplifi- in the DEPSYM++ algorithm1 for different relative
cation systems on extractive summarization techniques. pronouns:
Four simplification systems corresponding to rule-based
method, optimization method, supervised deep learn- • For the relative pronoun where, the noun phrase along
ing model, and unsupervised deep learning model were with its syntactical head is inserted after the relative
used to preprocess the input documents. For ranking, the phrase as described in Figure 2. In the figure, the noun
sentences word-based, sentence-based and graph-based phrase Sun City and its head in is inserted after the
scoring techniques were used. It was observed that with relative phrase.
respect to different scoring techniques, the simplified • For relative pronoun whose, the noun phrase is con-
inputs obtained higher ROUGE-1 scores in comparison verted to its possessive form and the relative phrase is
with raw inputs. The summaries generated after applying inserted after the noun phrase as described in Figure 3.
rule-based and supervised deep learning model-based In the figure, John is converted to John’s and the rela-
simplification demonstrated significant gains in terms of tive phrase house is in Sun City is attached with it.
the evaluation metrics. • For other relative pronouns, if the relative token
(token with relcl/rcmod tag) succeeds the relative pro-
noun in the sentence, then the noun phrase is inserted
3. TEXT SIMPLIFICATION just before the relative phrase. If the relative token
Transformations based on the syntactic structure of the does not succeed the relative pronoun in the sentence,
sentence have been utilized for performing text sim- then the noun phrase is inserted in the relative phrase
plification by many different works in literature. These before the phrase corresponding to the right subtree
include transformation rules using typed dependency of the relative token.
representations [26] and synchronous dependency gram-
mars [27]. Ferres et al. [28] and Scranton et al. [29] used
3.2 Conjoint Clause
dependency parse structure, while Garin et al. [30] used
constituency parse trees to perform simplification trans- DEPSYM algorithm splits sentences containing conjoint
formations. Evans and Orason [31] used a sign tagger to clauses. The relevant conjunctions are appended as the
identify compound clauses and rule-based transforma- first token of the sentence.
tions for rewriting.
For illustration, consider the sentence
DEPSYM++ used in this work utilizes rules based on John went to play golf but did not take his car.
the dependency tree structure of the sentence for sim-
plification of appositive clause, relative clause, and con- This sentence is simplified as:
joint clause, along with passive-to-active conversion. The John went to play golf. But John did not take his car.
158 N. CHATTERJEE AND R. AGARWAL: STUDYING THE EFFECT OF SYNTACTIC SIMPLIFICATION ON TEXT SUMMARIZATION

similarity between the respective sentences. For illustra-


tion, suppose Si and Sj are two sentences with N and M
words, respectively. Si ∩ Sj contains the common words
of Si and Sj . Then, Sim(Si , Sj ) is defined as shown in
Equation (1). Nodes of the graph are ranked using the
PageRank [33] Algorithm. Rank of a sentence Si is calcu-
lated using Equation (2), where N(Si ) denotes the neigh-
bors of the node representing Si , and d is the damping
factor which is set to 0.85 [33]

Count(Si ∩ Sj )
Figure 2: Simplification of relative clause with relative pronoun Sim(Si , Sj ) = (1)
log(N) + log(M)
where
 Sim(Si , Sj )
Rank(Si ) = (1 − d) + d · 
Sj ∈N(Si ) Sk ∈N(Sj ) Sim(Sj , Sk )

× Rank(Sj ) (2)

4.2 LexRank
LexRank [4] is a ranking algorithm based on eigenvector
centrality in a graph representation of sentences. Here,
the nodes of the graph are connected with weighted
edges representing the cosine similarity of the term
frequency-inverse document frequency (TF-IDF) vec-
tors of the respective sentences. For illustration, suppose
Figure 3: Simplification of relative clause with relative pronoun Si and Sj are two sentences with TF-IDF vector Vi and
whose
Vj , respectively. Then, Sim(Si , Sj ) is defined as shown in
Equation (3). The adjacency matrix of this sentence graph
After sentence splitting, the selection of each indi- is a connectivity matrix based on intra-sentence cosine
vidual sentence in the summary is independent of similarity. A similarity matrix is constructed by divid-
the presence of its context in the summary. Thus, in ing each element of the adjacency matrix by the sum
DEPSYM++ the conjunctions “and”, “but” are removed of the corresponding column entry. The left eigenvec-
from the simplified sentences if they appear as the first tor corresponding to the eigenvalue 1 of this similarity
word. matrix corresponds to its stationary distribution. The
sentence/node ranking in LexRank is similar to that of
TextRank, as discussed in Section 4.1.
4. TEXT SUMMARIZATION
For summarization of input documents, six different Sim(Si , Sj ) = CosineSimilarity(Vi , Vj )
techniques covering both extractive and abstractive sum- Vi · Vj
= (3)
marization have been considered. For extraction graph- ||Vi ||·||Vj ||
based summarization [3,4], term-frequency-based sum-
marization [2] and matrix factorization-based summa-
rization [5] have been used while Transformer-based 4.3 LSA
neural summarization [32] has been used for gen- Latent Semantic Analysis (LSA) [5] summarizers use
eration of abstractive summaries. The details are as Singular Value Decomposition for the extraction of
follows: important sentences of a given document. Here, the
input documents are represented using a sparse sen-
tence matrix, where each column represents the weighted
4.1 TextRank
term-frequency vector of an individual sentence. Sup-
TextRank [3] is a graph-based sentence ranking tech- pose a given input document D contains n sentences and
nique. Here, a graph is constructed where each node m unique words. The sentence matrix M is an m × n
represents a sentence and weighted edges represent the matrix where Mij is the number of times ith word appears
N. CHATTERJEE AND R. AGARWAL: STUDYING THE EFFECT OF SYNTACTIC SIMPLIFICATION ON TEXT SUMMARIZATION 159

in sentence Sj . Sentences of the document are ranked on Table 1: Data statistics


the basis of the singular values of M. DUC-2001 DUC-2002 CNN/DM
Number of documents 100 100 100
Range of number of sentences per (8,123) (8,82) (14,46)
4.4 SumBasic document
Range of number of sentences after (18,247) (19, 148) (24,165)
SumBasic [2] is a frequency-based summarizer. Here, the simplification per document
Average number of sentence per 37.49 27.71 39.20
sentences are ranked on the basis of the average prob- document
ability of the words in the sentence. The probability of Average number of sentence per 86.9 60.27 94.93
document after simplification
each word is calculated as the ratio of the number of Range of number of words (210,2193) (182, 1376) (284,1678)
times it appears in the input document to the total num- Range of number of words after (228,2497) (200, 1447) (311,1846)
simplification
ber of words in the document. Suppose the input docu- Average number of words per 899.09 624.39 935.12
ment consists of N words. The score of a sentence S = document
w1 w2 . . . wm of m words is computed using Equation (4). Average number of words per 996.06 691.44 1032.41
document after simplification
Here, freq(wi ) denotes the frequency of wi in the Number of reference summaries 3 2 1
document. per document
Average number of words per 113.463 104.695 98.62
 m 
1  freq(wi )
reference summary
Score(S) = (4)
m N
i=1

three data sets, namely DUC-2001, DUC-2002, and


4.5 BART
CNN/Daily Mail [36] Test Dataset. In DUC-2001 and
BART [32] is a denoising autoencoder used for pretrain- DUC-2002, each document is paired with 2–3 man-
ing sequence-to-sequence Transformer models for vari- ual summaries. Experiments have been conducted on
ous NLP tasks, such as Question Answering, Token Clas- 100 documents from each DUC-2001 and DUC-2002
sification, and Abstractive Summarization. It pretrains a datasets. In CNN/Daily Mail test data set, each docu-
model by combining Bidirectional and Auto-Regressive ment is paired with a set of 1–5 highlight sentences.
Transformers. The pretraining procedure is a two-step The number of words in the highlights (reference sum-
process where first, the input text is corrupted with mary) ranged from 9 to 676, with an average of about
an arbitrary noising function. A sequence-to-sequence 58 words. A set of 100 documents are selected such that
model is learned to reconstruct the original text. the total number of words in the reference highlight sen-
tences of the selected documents lies in the range of
90–110. This helps in ensuring that the generated 100-
5. EXPERIMENTAL DETAILS
word summary is compared with a reference summary
For summarization using sentence ranking techniques, having similar number of words during the evaluation
namely, TextRank, LexRank, LSA, and SumBasic, sen- process. A huge mismatch in the length of the generated
tences of the input documents are preprocessed. In the and reference summary could lead to bias in the calcu-
preprocessing step, tokenization and stemming are per- lation of precision and recall terms. Another way could
formed using NLTK [34]. Stopwords are also removed be to compare the generated summary with only the
using NLTK list of English Stopwords. Every input sen- first 100 words of the reference summary, but this could
tence is ranked (scored) using the TextRank, LexRank, lead to deletion of important facts from the reference,
LSA, and SumBasic techniques, as discussed in Section 4. which may be otherwise present in the generated sum-
The implementation of TextRank, LexRank, LSA, and mary resulting in a lower evaluation metric score. The
SumBasic available in the Sumy2 package has been used data4 statistics for the three datasets is reported in Table 1.
in the present work. For each of the above techniques, It can be observed that the selected data set consists of
the top-ranking sentences are included in the summary documents having diverse characteristics. Upon appli-
till word limit of 100 is attained. For abstractive sum- cation of DEPSYM++, it was observed that syntactic
marization using BART,3 the pre-trained model from simplifications were performed for 80.91%, 81.91%, and
summarization pipeline of Transformers [35] has been 84.60% input sentences of DUC-2001, DUC-2002, and
used to generate 100 word summaries. CNN/Daily Mail datasets, respectively. This is because
the proposed simplification algorithm simplifies only the
For evaluating the efficacy of the application of syn- sentences containing complex syntactical constructions,
tactic simplification on text summarization outputs, namely passive voice, appositive, relative, and/or conjoint
experiments have been conducted on documents from clauses.
160 N. CHATTERJEE AND R. AGARWAL: STUDYING THE EFFECT OF SYNTACTIC SIMPLIFICATION ON TEXT SUMMARIZATION

6. EVALUATION METRICS AND RESULTS using n-gram co-occurrence statistics. The


ROUGE-n score of a system summary is computed using
In order to study the effect of syntactic simplifica-
Equation (5).
tion on text summarization, summaries were gener-
ated using different summarization techniques from 
raw inputs as well as after applying the preprocess-  R∈{References}
gramn ∈R Countmatch (gramn )
ing step of Syntactic simplification. The performance ROUGE − n =  (5)
has been measured using different evaluation met-  R∈{References}
rics, such as ROUGE, BERTScore, and different shal- gramn ∈R Count(gramn )
low metrics. Tables 2–4 show the results with respect
where Countmatch (gramn ) is the maximum number of n-
to ROUGE, BERTScore, and different shallow metrics.
grams co-occurring in a system summary and a reference
Higher score with respect to simplified and raw (i.e.
summary (R), and Count(gramn ) is the total number of
non-simplified) inputs for each summarization strategy
n-grams in the reference summary.
has been bold-faced in the corresponding tables, and
the highest value for each column has been italicized.
ROUGE-n scores for n = 1, 2 for summaries gener-
The column Simplified indicates whether DEPSYM-
ated using different summarization methods have been
based simplification on the input text has been applied
reported in Table 2. Additionally, ROUGE-L, which is
or not.
based on Longest Common Subsequence matching, and
ROUGE-SU4, which matches bigrams with a maximum
skip distance of 4, have also been reported.
6.1 Performance with Respect to ROUGE
Different ROUGE metrics [37] have been used to eval- Figure 4 indicates the percentage change in ROUGE
uate the summaries. ROUGE is a measure of overlap scores after simplification for different summarization
between system summaries and reference summaries schemes on different datasets. Let RSraw and RSsimple

Table 2: ROUGE (recall) scores for different summarization methods


DUC-2001 DUC-2002 CNN/DM
Input R-1 R-2 R-L R-SU4 R-1 R-2 R-L R-SU4 R-1 R-2 R-L R-SU4
TextRank Raw 32.86 13.79 30.38 14.73 37.41 15.93 29.58 16.94 34.91 12.77 32.30 14.19
Simplified 34.13 14.26 31.96 15.14 40.30 17.66 32.96 18.48 40.70 16.16 38.03 17.64
LexRank Raw 30.44 11.43 28.26 12.66 35.18 14.81 28.45 15.56 35.05 12.08 32.43 13.45
Simplified 34.78 14.24 32.76 15.02 37.96 15.06 30.87 16.43 40.62 14.81 38.29 16.66
LSA Raw 27.41 11.21 25.29 11.66 32.96 13.56 25.92 14.33 28.52 9.63 26.37 10.61
Simplified 31.83 13.88 29.82 13.96 37.41 16.53 30.92 16.87 36.90 14.76 34.57 15.17
SumBasic Raw 33.64 16.83 31.43 16.97 36.53 17.69 29.74 18.04 42.11 19.03 40.00 18.91
Simplified 33.69 16.16 32.01 16.21 38.32 18.12 32.00 18.40 41.16 17.99 39.42 18.20
BART Raw 32.06 14.31 30.39 14.90 34.42 15.26 28.63 15.46 42.13 20.22 40.59 20.24
Simplified 33.00 14.96 31.48 15.15 35.60 16.16 29.92 16.31 42.59 21.26 40.49 20.47

Table 3: Precision, Recall and F1-Score of BERTScore for different summarization methods
DUC-2001 DUC-2002 CNN/DM
Input PBERT RBERT F1BERT PBERT RBERT F1BERT PBERT RBERT F1BERT
TextRank Raw 58.79 59.73 59.23 60.66 62.66 61.61 58.03 60.51 59.21
Simplified 58.24 60.44 59.29 61.31 62.16 61.70 60.91 61.71 61.28
LexRank Raw 57.70 59.98 58.79 60.44 61.64 60.99 57.67 60.43 58.99
Simplified 58.84 59.99 59.39 60.63 62.48 61.50 60.42 61.84 61.10
LSA Raw 56.27 58.78 57.47 58.71 60.68 59.65 55.47 58.19 56.76
Simplified 57.80 59.26 58.49 60.11 61.54 60.78 58.80 60.28 59.50
SumBasic Raw 59.39 60.90 60.11 61.50 61.62 61.53 61.03 63.31 62.12
Simplified 59.47 59.63 59.52 61.50 62.38 61.91 61.45 61.54 61.46
BART Raw 61.28 59.80 60.51 62.53 60.99 61.72 64.71 63.51 64.09
Simplified 61.87 59.88 60.83 63.39 60.79 62.03 65.57 65.08 65.30
N. CHATTERJEE AND R. AGARWAL: STUDYING THE EFFECT OF SYNTACTIC SIMPLIFICATION ON TEXT SUMMARIZATION 161

Table 4: Properties of the generated summaries


DUC-2001 DUC-2002 CNN/DM
Input ASL SCR FKGL ASL SCR FKGL ASL SCR FKGL
Reference 15.3 3.1 10.3 14.9 0.2 9.8 13.4 1.4 7.7
Average of all Raw 18.8 69.7 11.1 17.9 69.7 10.2 21.3 70.7 11.7
methods Simplified 14.1 14.0 8.8 13.4 14.0 8.2 14.7 10.4 8.2
TextRank Raw 22.8 100.0 14.3 20.5 100.0 12.0 24.7 100.0 14.5
Simplified 16.4 16.2 10.4 15.1 18.3 9.4 16.8 11.9 9.9
LexRank Raw 19.5 100.0 11.5 18.1 100.0 10.5 21.5 100.0 12.2
Simplified 11.3 15.1 7.7 11.9 16.6 7.7 12.3 9.9 7.3
LSA Raw 22.1 100.0 13.8 20.9 100.0 12.8 25.0 100.0 15.0
Simplified 13.7 12.8 8.8 12.7 13.8 7.9 14.6 10.2 8.6
SumBasic Raw 19.1 100.0 11.4 18.0 100.0 10.8 21.0 100.0 12.2
Simplified 17.5 24.8 11.5 16.4 23.7 10.7 19.1 24.9 11.7
BART Raw 15.0 16.2 8.7 14.6 16.4 8.1 18.2 20.6 8.9
Simplified 12.9 12.1 8.2 11.7 9.6 7.5 12.8 4.6 7.0

Figure 4: Percentage change in ROUGE Scores after application of DEPSYM Simplification for different summarization schemes

be the ROUGE scores obtained when summarization is This may be attributed to the naïve word frequency-based
performed on raw text and simplified text, respectively. sentence ranking approach of SumBasic.
The percentage change is calculated using the following
equation: It can also be observed that for DUC datasets the
outputs of sentence ranking or extractive summarizers
RSsimple − RSraw have significantly higher ROUGE score than the out-
× 100 (6) put generated using the abstractive neural summarizer,
RSraw
namely BART. The outputs from all other summarizers
except LSA have comparable performance for documents
It can be observed from Table 5 that TextRank, LexRank, of CNN/Daily Mail.
LSA, and BART gave better performance when simplifi-
cation is applied for all the three datasets under all the
four ROUGE metrics mentioned above. Application of
6.2 Performance with Respect to BERTScore
simplification with LSA summarization yields the highest
increase in ROUGE. In fact, it can be observed that with Although ROUGE metrics are widely accepted for sum-
respect to ROUGE-2, there is an increase of more than mary evaluation, it is biased towards lexical overlap. It
50% for LSA summarizer on the CNN/DailyMail dataset. overlooks the fact that semantic overlap between texts
can occur irrespective of lexical dissimilarity. In the
SumBasic summarizer shows better performance with- present work, BERTScore [38] addresses the above draw-
out simplification for DUC 2001 and CNN/DailyMail back in the following way. Similarity using BERTScore
datasets. However, the magnitude of decrease in scores is calculated as a weighted sum of cosine similari-
after the application of simplification is only marginal. ties between the tokens. It correlates well with human
162 N. CHATTERJEE AND R. AGARWAL: STUDYING THE EFFECT OF SYNTACTIC SIMPLIFICATION ON TEXT SUMMARIZATION

Table 5: Summary for article la082889-0067 of DUC2002 precision, recall, and F1-Score across the three datasets.
Reference Summary Here, the overall best values are obtained for summaries
Dave Johnson of the US clinched the decathlon gold medal in the final event
of the World University Games. Another American, Sheldon Blockburger,
generated by BART with simplified inputs. From Table 3,
was in third place in the decathlon until that last event but was overtaken it can be observed that, with respect to BERTScore, out-
by Szabo of Hungary. Medved of the USSR won the silver. Americans won puts generated using BART have higher6 values for all
gold and bronze in the men’s 400-meter hurdles. Ana Quirot of Cuba beat
Jearl Miles of the US in the women’s 400 meters. American Llewellyn the datasets in comparison with other extractive systems.
Starks was third in the long jump. Romanian Paula Ivan set a meet record This can be ascribed to the similarity in the training
in winning the women’s 3,000-meter race
objectives of BART and BERT. It can also be observed that
TextRank for CNN/DailyMail dataset, the magnitude of increase in
Johnson trailed Mikhail Medved of the Soviet Union by 11 points before the
1,500-meter race, the final event of the two-day competition. Olympic scores after application of simplification is very high for
steeplechase champion Julius Kariuki of Kenya, running the distance extractive summarizers TextRank, LexRank, and LSA.
for the first time this year, won the men’s 10,000-meter race, while the
Romanian team earned six gold medals. Kariuki, who won the gold medal
at the 1988 Seoul Games in the 3,000-meter steeplechase, beat out Zeki
Oeztuerk of Turkey to win the 10,000 in the slow time of 28:35. Four 6.3 Shallow Properties of the Generated Summary
of Romania’s golds came in rowing and two in track and field, one of them
from Olympic champion Paula Ivan, who set a meet record in winning In order to analyze the properties of the generated sum-
the women’s 3,000-meter race. maries, Average Sentence Length (ASL), Sentence Copy
Syntactic Simplification + TextRank Rate (SCR), and Flesch-Kincaid Grade Level (FKGL)
Dave Johnson of the United States clinched the Decathlon gold medal in have been calculated. ASL is the mean number of words
the final event at the world University games Sunday. Johnson trailed
Mikhail Medved of the Soviet Union by 11 points before the 1,500-Meter in the summary sentences. SCR of a summary is mea-
race, the final event of the two-day competition. Blockburger earned only sured as the proportion of summary sentences that are
569 points in the final event. Kariuki beat out Zeki Oeztuerk of Turkey
to win the 10 000 in the slow time of 28:35. Kevin Henderson took
exact copies of the source text sentences. FKGL [40] score
the bronze for the U. S. in the men’s 400 - meter intermediate hurdles. is calculated using the formula given in Equation (9).
Olympic champion Paula Ivan set a meet record in winning the It is a readability metric designed to indicate the diffi-
women’s 3 000-Meter race.
culty level of a given text. A lower FKGL score implies
a lower level of difficulty. It can be noted that although
judgment for evaluating different language generation the FKGL score does not measure the overall informa-
tasks [39]. tion content of the summary, it is a good indicator of how
simple the summary is in terms of its ease of readabil-
Let the system-generated sentence be S = s1 s2 ...sn , and ity. Moreover, studying the change in FKGL scores of the
the reference be R = r1 r2 ...rm then precision (PBERT ) and summary after the application of DEPSYM+ also high-
recall (RBERT ) for BERTScore are calculated as shown in lights the efficacy of simplification as a preprocessing step
Equation (7). Here, −→
si and − →rj , denote the contextualized to summarization.
5
BERT embedding of the words si and rj , respectively.
F1-Score (F1BERT ) is calculated as the harmonic mean of number of words
FKGL = 0.39
PBERT and RBERT . number of sentences
number of syllables
+ 11.8 − 15.59 (9)
1 1 
n m
number of words
PBERT = max − →
rj T −

si ; RBERT = max − →
rj T −

si
n i=1 rj ∈R m j=1 si ∈S
ASL, SCR, and FKGL of the reference and generated
(7) summaries are reported in Table 4. Here, the rows cor-
responding to Average of all methods represent the mean
Table 3 provides the evaluation results for the generated of the metric values of different summarization methods.
summaries with respect to BERTScore for simplified and
non-simplified inputs. Figure 5 indicates the percentage Figure 6 gives a visual representation of the percent-
change in BERTScores after simplification for different age decrease in ASL, SCR, and FKGL after simplifica-
summarization schemes on different datasets. The per- tion for different summarization schemes on different
centage change is calculated similar to the way described datasets. Let Valraw and Valsimple be the feature value
in Equation (6). obtained when summarization is performed on raw text
and simplified text, respectively. The percentage decrease
It can be seen that the summaries generated using simpli- is calculated using the following equation
fied inputs have higher BERTScores for most of the meth-
ods. Similar to ROUGE, the application of simplification Valraw − Valsimple
× 100 (10)
with LSA summarizer leads to high gains in terms of Valraw
N. CHATTERJEE AND R. AGARWAL: STUDYING THE EFFECT OF SYNTACTIC SIMPLIFICATION ON TEXT SUMMARIZATION 163

Figure 5: Percentage change in BERT Scores after application of DEPSYM Simplification for different summarization schemes

Figure 6: Percentage decrease in Shallow feature values after application of DEPSYM Simplification for different summarization schemes

It can be observed that summaries generated using raw An example of the generated summary with and without
inputs contain sentences longer than those generated simplification along with a reference summary, is pre-
using simplified inputs. Moreover, the ASL of summaries sented in Table 5. Here, it can be observed that syntactic
generated with simplified inputs is closer to the ASL of simplification helps to break long sentences such that
manually written reference summaries. only the informative parts of the sentence are included
in the summary. The FKGL score of the summary gener-
There is a high decrease in SCR values for all the ated using TextRank is 10.9, while it reduces to 6.5 for the
summarizers across the three datasets. This indicates summary generated using TextRank after simplification.
that the summaries generated after applying simplifi-
cations are abstractive summaries. That is, even for
extractive methods, such as TextRank, LSA, simplified
sentences are included in the summary. Even for the 7. CONCLUSION
generative BART summarizer, the decrease in SCR is In the era of internet and smartphones, when an abun-
more than 20% for CNN/DailyMail and more than 40% dance of textual information can be retrieved with a
for DUC. single click, the need for efficient ATS systems is of prime
importance. ATS systems help a user to go through the
FKGL values for the summaries with simplified inputs important information of a text in lesser time and also
are lower than those generated with raw inputs. For to decide its relevance for any specific task. Summary
DUC documents, the FKGL of simplified summaries are generation by the selection of top-ranked sentences often
even lower than reference summaries. This indicates that includes irrelevant phrases and concepts in the summary.
syntactic simplification enhances the readability of the This is especially true when the input text consists of long
generated summary. sentences. Deletion of phrases corresponding to a specific
164 N. CHATTERJEE AND R. AGARWAL: STUDYING THE EFFECT OF SYNTACTIC SIMPLIFICATION ON TEXT SUMMARIZATION

syntactic structure without any consideration of its infor- ORCID


mation content and relevance can affect the quality of Niladri Chatterjee http://orcid.org/0000-0003-3832-4003
the summary. In the present work, a syntactic simplifica-
tion system, DEPSYM++, has been applied on the input Raksha Agarwal http://orcid.org/0000-0002-9356-4387
text as a preprocessing step. In this research, the effect
of syntactic simplification has been studied for five dif-
ferent summarization techniques of both extractive and
abstractive types. Experiments have been conducted on REFERENCES
documents from three datasets containing news articles, 1. I. Mani, and M. T. Maybury. Advances in automatic text
namely DUC-2001, DUC-2002, and CNN/Daily Mail. summarization. Cambridge, MA: MIT Press, 1999.
Results indicate that the application of syntactic simpli-
2. A. Nenkova, and L. Vanderwende, “The impact of fre-
fication helps to generate better summaries in terms of quency on summarization,” Microsoft Res Redmond
the different evaluation metrics. Washington Tech. Rep. MSR-TR-2005, Vol. 101, 2005.

In the present work, complexity of the input text is 3. R. Mihalcea, and P. Tarau. “TextRank: Bringing order into
identified by the presence of four complex syntacti- text,” in Proceedings of the 2004 conference on empirical
methods in natural language processing, 2004, pp. 404–11.
cal structures. Various machine learning-based meth-
ods have been developed in the literature for automatic 4. G. Erkan, and D. R. Radev, “Lexrank: graph-based lexical
assessment of text complexity using sequence process- centrality as salience in text summarization,” J. Artif. Intell.
ing [41], linguistic features [42], shallow semantic fea- Res., Vol. 22, pp. 457–79, 2004.
tures [43], and knowledge graphs [44]. In the future, we
would also like to incorporate this in the text simplifica- 5. Y. Gong, and X. Liu. “Generic text summarization using
relevance measure and latent semantic analysis,” in Pro-
tion module. The present work can also be extended to ceedings of the 24th annual international ACM SIGIR
study the effects of simplification on texts pertaining to conference on Research and development in information
specific domains, such as medical, legal among others. retrieval, 2001, pp. 19–25.
The summary evaluation module may be expanded to
include semantically enhanced Jaccard similarity, which 6. S. Park, B. Cha, and D. U. An, “Automatic multi-
document summarization based on clustering and non-
have been used in literature for evaluating automatic
negative matrix factorization,” IETE Tech. Rev., Vol. 27,
summaries of clinical texts [45]. no. 2, pp. 167–78, 2010.

7. N. Chatterjee, and N. Yadav, “Fuzzy rough set-based sen-


DISCLOSURE STATEMENT
tence similarity measure and its application to text sum-
No potential conflict of interest was reported by the author(s). marization,” IETE Tech. Rev., Vol. 36, no. 5, pp. 517–25,
2019.

FUNDING 8. H. Jing. “Sentence reduction for automatic text summa-


Raksha Agarwal acknowledges Council of Scientific and Indus- rization,” in Sixth Applied Natural Language Processing
trial Research (CSIR), India, for supporting the research [grant Conference, (Seattle, Washington, USA), Association for
no: SPM-06/086(0267)/2018-EMR-I]. Computational Linguistics, Apr. 2000, pp. 310–5.

9. K. Knight, and D. Marcu, “Summarization beyond sen-


tence extraction: A probabilistic approach to sentence
NOTES
compression,” Artif. Intell., Vol. 139, no. 1, pp. 91–107,
1. Code available at https://github.com/RakshaAg/DEPSYM 2002.
Sum.
10. C. Zong, R. Xia, and J. Zhang. Text data mining. Singapore:
2. https://pypi.org/project/sumy/. Springer, 2021.

11. A. Siddharthan, A. Nenkova, and K. McKeown. “Syn-


3. https://huggingface.co/facebook/bart-large-cnn.
tactic simplification for improving content selection in
multi-document summarization,” in COLING 2004: Pro-
4. Dataset is available at https://github.com/RakshaAg/DEPS ceedings of the 20th International Conference on Computa-
YMSum. tional Linguistics, (Geneva, Switzerland), COLING, 2004,
pp. 896–902.
5. bert-base-uncased.
12. M. Shardlow, “A survey of automated text simplification,”
6. Except recall for DUC. Int. J. Adv. Comput. Sci. Appl., Vol. 4, no. 1, pp. 58–70, 2014.
N. CHATTERJEE AND R. AGARWAL: STUDYING THE EFFECT OF SYNTACTIC SIMPLIFICATION ON TEXT SUMMARIZATION 165

13. N. Chatterjee, and R. Agarwal. “DEPSYM: A lightweight 25. R. Vale, R. D. Lins, and R. Ferreira. “An assessment
syntactic text simplification approach using dependency of sentence simplification methods in extractive text
trees,” in Proceedings of the First Workshop on Current summarization,” in Proceedings of the ACM Sympo-
Trends in Text Simplification (CTTS 2021), co-located with sium on Document Engineering 2020, DocEng ‘20, (New
SEPLN, 2021, pp. 42–56. York, NY, USA), Association for Computing Machinery,
2020.
14. S. Štajner, and M. Popovic. “Can text simplification help
machine translation?” in Proceedings of the 19th Annual 26. A. Siddharthan. “Text simplification using typed depen-
Conference of the European Association for Machine Trans- dencies: A comparison of the robustness of different gen-
lation, 2016, pp. 230–42. eration strategies,” in Proceedings of the 13th European
Workshop on Natural Language Generation, 2011, pp. 2–11.
15. E. Hasler, A. de Gispert, F. Stahlberg, A. Waite, and
B. Byrne, “Source sentence simplification for statistical 27. A. Siddharthan, and A. Mandya. “Hybrid text simplifica-
ma-chine translation,” Comput. Speech. Lang., Vol. 45, tion using synchronous dependency grammars with hand-
pp. 221–35, 2017. written and automatically harvested rules,” in Proceedings
of the14th Conference of the European Chapter of the Asso-
16. T. Dadu, K. Pant, S. Nagar, F. A. Barbhuiya, and ciation for Computational Linguistics, 2014, pp. 722–31.
K. Dey. “Text simplification for comprehension-based
question-answering,” arXiv preprint arXiv:2109.13984, 28. D. Ferres, M. Marimon, H. Saggion, and A. AbuRa’ed.
2021. “Yats: yet another text simplifier,” in International Confer-
ence on Applications of Natural Language to Information
17. D. Vickrey, and D. Koller. “Sentence simplification for Systems, Springer, 2016, pp. 335–42.
semantic role labeling,” in Proceedings of ACL-08: HLT,
(Columbus, Ohio), Association for Computational Linguis- 29. C. Scarton, A. P. Aprosio, S. Tonelli, T. M. Wanton, and
tics, June 2008, pp. 344–52. L. Specia. “Musst: a multilingual syntactic simplification
tool,” in Proceedings of the IJCNLP 2017, System Demon-
18. R. Evans, and C. Orasan. “Sentence simplification for strations, 2017, pp. 25–8.
semantic role labelling and information extraction,” in Pro-
ceedings of the International Conference on Recent Advances 30. A. Garain, A. Basu, R. Dawn, and S. K. Naskar. “Sentence
in Natural Language Processing (RANLP), (Varna, Bul- simplification using syntactic parse trees,” in 4th Inter-
garia), 2019, pp. 285–94. national Conference on Information Systems and Com-
puter Networks (ISCON), 2019, pp. 672–6.
19. S. B. Silveira, and A. Branco. “Combining a double clus-
tering approach with sentence simplification to produce 31. R. Evans, and C. Orasan, “Identifying signs of syntactic
highly informative multi-document summaries,” in2012 complexity for rule-based sentence simplification,” Nat.
IEEE 13thInternational Conference on Information Reuse Lang. Eng, Vol. 25, no. 1, pp. 69–119, 2019.
Integration(IRI), 2012, pp. 482–89.
32. M. Lewis, et al. “BART: Denoising sequence-to-sequence
20. B. M. Rebello, G. L. d. Santos, C. R. B. d. Ávila, and A. d. pre-training for natural language generation, translation,
S. B. Kida, “Effects of syntactic simplification on reading and comprehension,” in Proceedings of the 58th Annual
comprehension of elementary school students,” Audiol- Meeting of the Association for Computational Linguistics,
Commun. Res., Vol. 24, pp. 1–8, 2019. (Online), Association for Computational Linguistics, July
2020, pp. 7871–80.
21. R. Cervantes, and G. Gainer, “The effects of syntactic sim-
plification and repetition on listening comprehension,” 33. S. Brin, and L. Page, “The anatomy of a large-scale hyper-
Tesol Q., Vol. 26, no. 4, pp. 767–70, 1992. textual web search engine,” Comput. Netw. ISDN Syst.,
Vol. 30, no. 1–7, pp. 107–117, 1998.
22. L. Vanderwende, H. Suzuki, C. Brockett, and A. Nenkova,
“Beyond sumbasic: task-focused summarization with sen- 34. S. Bird, E. Klein, and E. Loper. Natural language process-
tence simplification and lexical expansion,” Inf. Pro- ing with Python: analyzing text with the natural language
cess. Manage., Vol. 43, no. 6, pp. 1606–18, 2007. Text toolkit. Sebastopol, CA: O’Reilly Media, Inc. 335, 2009.
Summarization.
35. T. Wolf, et al. Transformers: State-of-the-art natural lan-
23. C. Finegan-Dollak, and D. R. Radev, “Sentence simplifica- guage processing, in: Proceedings of the 2020 Conference on
tion, compression, and disaggregation for summarization Empirical Methods in Natural Language Processing: System
of sophisticated documents,” J. Assoc. Inf. Sci. Technol., Demonstrations, Association for Computational Linguistics,
Vol. 67, no. 10, pp. 2437–53, 2016. Online, 2020, pp. 38–45.

24. F. Zaman, M. Shardlow, S.-U. Hassan, N. R. Aljohani, and 36. K. M. Hermann, T. Kocisky, E. Grefenstette, L. Espeholt,
R. Nawaz, “HTSS: A novel hybrid text summarisation and W. Kay, M. Suleyman, and P. Blunsom, “Teaching machines
simplification architecture,” Inf. Process. Manage., Vol. 57, to read and comprehend,” Adv. Neural. Inf. Process. Syst.,
no. 6, pp. 102351, 2020. Vol. 28, pp. 1693–701, 2015.
166 N. CHATTERJEE AND R. AGARWAL: STUDYING THE EFFECT OF SYNTACTIC SIMPLIFICATION ON TEXT SUMMARIZATION

37. C.-Y. Lin. “Looking for a few good metrics: Auto- text,” in Biologically inspired cognitive architectures meeting,
matic summarization evaluation-how many samples are Advances in Intelligent Systems and Computing, vol. 948, A.
enough?” in NTCIR, 2004. Samsonovich Ed. Cham: Springer, 2019, pp. 449–54.

38. T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger, and Y. 42. R. Agarwal, and N. Chatterjee. “Gradient boosted trees
Artzi. “Bertscore: Evaluating text generation with BERT,” for identification of complex words in context,” in Pro-
in 8th International Conference on Learning Represen- ceedings of the First Workshop on Current Trends in Text
tations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, Simplification (CTTS 2021) co-located with SEPLN, 2021,
2020, OpenReview.net, 2020. pp. 12–28.

39. S. Li, D. Lei, P. Qin, and W. Y. Wang. “Deep reinforcement 43. S. Stajner, and I. Hulpus. “When shallow is good enough:
learning with distributional semantic rewards for abstrac- Automatic assessment of conceptual text complexity using
tive summarization,” in Proceedings of the 2019 Confer- shallow semantic features,” in Proceedings of the 12th Lan-
ence on Empirical Methods in Natural Language Process- guage Resources and Evaluation Conference, (Marseille,
ing and the 9th International Joint Conference on Natu- France), European Language Resources Association, May
ral Language Processing (EMNLP-IJCNLP), (Hong Kong, 2020, pp. 1414–22.
China), Association for Computational Linguistics, Nov.
2019, pp. 6038–44. 44. S. Stajner, and I. Hulpus. Automatic assessment of concep-
tual text complexity using knowledge graphs,” in Proceed-
40. J. P. Kincaid, R. P. Fishburne Jr, R. L. Rogers, and B. S. ings of the 27th International Conference on Computational
Chissom. “Derivation of new readability formulas (auto- Linguistics, (Santa Fe, New Mexico, USA), Association for
mated readability index, fog count and flesch reading ease Computational Linguistics, Aug. 2018, pp. 318–30.
formula) for navy enlisted personnel,” tech. rep., Naval
Technical Training Command Millington TN Research 45. M. Afzal, F. Alam, K. M. Malik, and G. M. Malik, “Clin-
Branch, 1975. ical context–aware biomedical text summarization using
deep neural network: model development and validation,”
41. D. Schicchi, G. L. Bosco, and G. Pilato, “Machine learn- J. Med. Internet Res., Vol. 22, no. 10, pp. e19810, 2020.
ing models for measuring syntax complexity of English

AUTHORS Raksha Agarwal is a PhD scholar in the


Department of Mathematics, IIT Delhi.
Niladri Chatterjee is the chair professor Her primary research areas are natural
of artificial intelligence in IIT Delhi. He is language processing and machine learn-
a professor of statistics and computer sci- ing, with focus on abstractive text summa-
ence in the Department of Mathematics, rization. She has done Master of Science
IIT Delhi, School of IT and School of AI in mathematics from IIT Delhi and Bach-
of IIT Delhi. His primary research areas elor of Science in mathematics from the
are artificial intelligence, natural language University of Delhi. She is the recipient of Shyama Prasad
processing, big data analytics, statistical Mukherjee Fellowship awarded by the Council of Scientific and
modelling. His association with IIT Delhi has been more than Industrial Research, India.
22 years. Prior to that, he had worked as a lecturer in University
College London, and a computer engineer at Indian Statistical Email: raksha.agarwal@maths.iitd.ac.in
Institute, Calcutta. He has been a visiting professor in Depart-
ment of Informatics, University of Pisa, Italy. He has supervised
ten PhDs and over a hundred master’s thesis in mathematics
and computing.
Corresponding author. Email: niladri@maths.iitd.ac.in

You might also like