Professional Documents
Culture Documents
Closed-Book Training To Improve Summarization Encoder Memory
Closed-Book Training To Improve Summarization Encoder Memory
Abstract Original Text (truncated): a family have claimed the body of an infant
who was discovered deceased and buried on a sydney beach last year ,
in order to give her a proper funeral . on november 30 , 2014 , two
A good neural sequence-to-sequence summa- young boys were playing on maroubra beach when they uncovered the
body of a baby girl buried under 30 centimetres of sand . now locals
rization model should have a strong encoder filomena d'alessandro and bill green have claimed the infant 's body in
arXiv:1809.04585v1 [cs.CL] 12 Sep 2018
order to provide her with a fitting farewell . 'we’re local and my husband
that can distill and memorize the important in- is a police officer and he’s worked with many of the officers investigating
it , ' ms d'alessandro told daily mail australia . scroll down for video .
formation from long input texts so that the de- a sydney family have claimed the body of a baby girl who was found
buried on maroubra beach ( pictured ) on november 30 , 2014 . filomena
coder can generate salient summaries based d'alessandro and bill green have claimed the infant 's remains , who they
have named lily grace , in order to provide her with a fitting farewell .
on the encoder’s memory. In this paper, ' above all as a mother i wanted to do something for that little girl , '
we aim to improve the memorization capa- she added . since january the couple , who were married last year and
have three children between them , have been trying to claim the baby
bilities of the encoder of a pointer-generator after they heard police were going to give her a ' destitute burial ' ...
pointer decoder
Attention
...
final state
... ... Barcelona have won the
closed-book
decoder
Barcelona beat Real Madrid 3-2 claim the Spanish Super Cup ...
Spanish
Madrid
closed-book decoder's
final distribution ... ... ...
a zoo
Figure 2: Our 2-decoder summarization model with a pointer decoder and a closed-book decoder, both sharing a
single encoder (this is during training; next, at inference time, we only employ the memory-enhanced encoder and
the pointer decoder).
is modeled as: the model’s behavior of copying from text with at-
eti T
= v tanh(Wh hi + Ws st + battn ) tention distribution ati versus sampling from vo-
t
cabulary with generation distribution Pvocab .
ati = softmax(eti ); ct =
X
ati hi (1)
i 3.2 Closed-Book Decoder
where v, Wh , Ws , and battn are learnable param- As shown in Eqn. 1, the attention distribution ai
eters. hi is encoder’s hidden state at ith encoding depends on decoder’s hidden state st , which is de-
step, and st is decoder’s hidden state at tth decod- rived from decoder’s memory state ct . If ct does
ing step. The distribution ati can be seen as the not encode salient information from the source
amount of attention at decode step t towards the text or encodes too much unimportant informa-
ith encoder state. Therefore, the context vector ct tion, the decoder will have a hard time to locate
is the sum of the encoder’s hidden states weighted relevant encoder states with attention. However,
by attention distribution at . as explained in the introduction, most gradients
At each decoding step, the previous context vec- are back-propagated through attention layer to the
tor ct−1 is concatenated with current input xt , and encoder’s hidden state ht , not directly to the final
fed through a non-linear recurrent function along memory state, and thus provide little incentive for
with the previous hidden state st−1 to produce the the encoder to memorize salient information in ct .
new hidden state st . The context vector ct is then Therefore, to enhance encoder’s memory, we
calculated according to Eqn. 1 and concatenated add a closed-book decoder, which is a unidi-
with the decoder state st to produce the logits for rectional LSTM decoder without attention/pointer
the vocabulary distribution Pvocab at decode step t: layer. The two decoders share a single encoder and
t
Pvocab = softmax(V2 (V1 [st , ct ]+b1 )+b2 ), where word-embedding matrix, while out-of-vocabulary
V1 , V2 , b1 , b2 are learnable parameters. To en- (OOV) words are simply represented as [UNK]
able copying out-of-vocabulary words from source for the closed-book decoder. The entire 2-decoder
text, a pointer similar to Vinyals et al. (2015) is model is represented in Fig. 2. During training, we
built upon the attention distribution and controlled optimize the weighted sum of negative log likeli-
by the generation probability pgen : hoods from the two decoders:
ptgen = σ(Uc ct + Us st + Ux xt + bptr ) 1X
T
t
t
X LXE = − ((1 − γ) log Pattn (w|x1:t )
Pattn (w) = ptgen Pvocab
t
(w) + (1 − ptgen ) ati T
t=1
(2)
i:wi =w t
+ γ log Pcbdec (w|x1:t ))
where Uc , Us , Ux , and bptr are learnable parame-
ters. xt and st are the input token and decoder’s where Pcbdec is the generation probability from the
state at tth decoding step. σ is the sigmoid func- closed-book decoder. The mix ratio γ is tuned on
tion. We can see pgen as a soft gate that controls the validation set.
ROUGE MTR ROUGE MTR
1 2 L Full 1 2 L Full
PREVIOUS WORKS PREVIOUS WORKS
?
(Nallapati16) 35.46 13.30 32.65 pg (See17) 39.53 17.28 36.38 18.72
pg (See17) 36.44 15.66 33.42 16.65 RL? (Paulus17) 39.87 15.82 36.90
OUR MODELS OUR MODELS
pg (baseline) 36.70 15.71 33.74 16.94 pg (baseline) 39.22 17.02 35.95 18.70
pg + cbdec 38.21 16.45 34.70 18.37 pg + cbdec 40.05 17.66 36.73 19.48
RL + pg 37.02 15.79 34.00 17.55 RL + pg 39.59 17.18 36.16 19.70
RL + pg + cbdec 38.58 16.57 35.03 18.86 RL + pg + cbdec 40.66 17.87 37.06 20.51
Table 1: ROUGE F1 and METEOR scores (non- Table 2: ROUGE F1 and METEOR scores (with-
coverage) on CNN/Daily Mail test set of previous coverage) on the CNN/Daily Mail test set. Cover-
works and our models. ‘pg’ is the pointer-generator age mechanism (See et al., 2017) is used in all mod-
baseline, and ‘pg + cbdec’ is our 2-decoder model with els except the RL model (Paulus et al., 2018). The
closed-book decoder(cbdec). The model marked with model marked with ? is trained and evaluated on the
? is trained and evaluated on the anonymized version anonymized version of the data.
of the data.
ROUGE MTR
1 2 L Full
3.3 Reinforcement Learning pg (See17) 37.22 15.78 33.90 13.69
pg (baseline) 37.15 15.68 33.92 13.65
In the reinforcement learning setting, our summa- pg + cbdec 37.59 16.84 34.43 13.82
rization model is the policy network that gener- RL + pg 39.92 16.71 36.13 15.12
RL + pg + cbdec 41.48 18.69 37.71 15.88
ates words to form a summary. Following Paulus
et al. (2018), we use a self-critical policy gradient Table 3: ROUGE F1 and METEOR scores on DUC-
training algorithm (Rennie et al., 2016; Williams, 2002 (test-only transfer setup).
1992) for both our baseline and 2-decoder model.
For each passage, we sample a summary y s =
s entire dataset has 287,226 training pairs, 13,368
w1:T +1 , and greedily generate a summary ŷ =
validation pairs and 11,490 test pairs. We use the
ŵ1:T +1 by selecting the word with the highest
same version of data as See et al. (2017), which is
probability at each step. Then these two sum-
the original text with no preprocessing to replace
maries are fed to a reward function r, which is the
named entities. We also use DUC-2002, which
ROUGE-L scores in our case. The RL loss func-
is also a long-paragraph summarization dataset of
tion is:
news articles. This dataset has 567 articles and
T
1X 1~2 summaries per article.
LRL = (r(ŷ) − r(y s )) log Pattn
t s
(wt+1 s
|w1:t )
T All the training details (e.g., vocabulary size,
t=1
(3) RNN dimension, optimizer, batch size, learning
where the reward for the greedily-generated sum- rate, etc.) are provided in the supplementary ma-
mary (r(ŷ)) acts as a baseline to reduce variance. terials.
We train our reinforced model using the mixture
of Eqn. 3 and Eqn. 2, since Paulus et al. (2018) 5 Results
showed that a pure RL objective would lead to
We first report our evaluation results on
summaries that receive high rewards but are not
CNN/Daily Mail dataset. As shown in Ta-
fluent. The final mixed loss function for RL is:
ble 1, our 2-decoder model achieves statistically
LXE+RL = λLRL +(1−λ)LXE , where the value
significant improvements1 upon the pointer-
of λ is tuned on the validation set.
generator baseline (pg), with +1.51, +0.74, and
4 Experimental Setup +0.96 points advantage in ROUGE-1, ROUGE-2
and ROUGE-L (Lin, 2004), and +1.43 points
We evaluate our models mainly on CNN/Daily advantage in METEOR (Denkowski and Lavie,
Mail dataset (Hermann et al., 2015; Nallap- 2014). In the reinforced setting, our 2-decoder
ati et al., 2016), which is a large-scale, long- model still maintains significant (p < 0.001)
paragraph summarization dataset. It has online
1
news articles (781 tokens or ~40 sentences on av- Our improvements in Table 1 are statistically significant
with p < 0.001 (using bootstrapped randomization test with
erage) with paired human-generated summaries 100k samples (Efron and Tibshirani, 1994)) and have a 95%
(56 tokens or 3.75 sentences on average). The ROUGE-significance interval of at most ±0.25.
Reference summary:
similarity
mitchell moffit and greg brown from asapscience present theories. pg (baseline) 0.817
different personality traits can vary according to expectations of parents.
beyoncé, hillary clinton and j. k. rowling are all oldest children. pg + cbdec (γ = 12 ) 0.869
Pointer-Gen baseline: pg + cbdec (γ = 23 ) 0.889
the kardashians are a strong example of a large celebrity family where
the siblings share very different personality traits.
pg + cbdec (γ = 56 ) 0.872
on asapscience on youtube, the pair discuss how being the first, middle, pg + cbdec (γ = 1011
) 0.860
youngest, or an only child affects us.
We presented a 2-decoder sequence-to-sequence Jackie Chi Kit Cheung and Gerald Penn. 2014. Unsu-
architecture for summarization with a closed-book pervised sentence enhancement for automatic sum-
marization. In Proceedings of the 2014 Conference
decoder that helps the encoder to better memo- on Empirical Methods in Natural Language Pro-
rize salient information from the source text. On cessing, pages 775–786.
CNN/Daily Mail dataset, our proposed model sig-
nificantly outperforms the pointer-generator base- Sumit Chopra, Michael Auli, and Alexander M Rush.
2016. Abstractive sentence summarization with at-
lines in terms of ROUGE and METEOR scores tentive recurrent neural networks. In Proceedings of
(in both a cross-entropy (XE) setup and a rein- the 2016 Conference of the North American Chap-
forcement learning (RL) setup). It also achieves ter of the Association for Computational Linguistics:
improvements in a test-only transfer setup on the Human Language Technologies, pages 93–98.
DUC-2002 dataset in both XE and RL cases. We James Clarke and Mirella Lapata. 2008. Global in-
further showed that our 2-decoder model indeed ference for sentence compression: An integer linear
programming approach. Journal of Artificial Intelli- Hongyan Jing and Kathleen R. McKeown. 2000. Cut
gence Research, 31:399–429. and paste based text summarization. In Proceed-
ings of the 1st North American Chapter of the As-
Michael Denkowski and Alon Lavie. 2014. Meteor sociation for Computational Linguistics Conference,
universal: Language specific translation evaluation NAACL 2000, pages 178–185, Stroudsburg, PA,
for any target language. In Proceedings of the ninth USA. Association for Computational Linguistics.
workshop on statistical machine translation, pages
376–380. Diederik Kingma and Jimmy Ba. 2015. Adam: A
method for stochastic optimization. In International
John Duchi, Elad Hazan, and Yoram Singer. 2011. Conference on Learning Representations.
Adaptive subgradient methods for online learning
and stochastic optimization. Journal of Machine Kevin Knight and Daniel Marcu. 2002. Summariza-
Learning Research, 12(Jul):2121–2159. tion beyond sentence extraction: A probabilistic ap-
proach to sentence compression. Artificial Intelli-
Bradley Efron and Robert J Tibshirani. 1994. An intro-
gence, 139(1):91–107.
duction to the bootstrap. CRC press.
Chin-Yew Lin. 2004. Rouge: A package for auto-
Katja Filippova, Enrique Alfonseca, Carlos A Col-
matic evaluation of summaries. In Text summariza-
menares, Lukasz Kaiser, and Oriol Vinyals. 2015.
tion branches out: Proceedings of the ACL-04 work-
Sentence compression by deletion with lstms. In
shop, volume 8. Barcelona, Spain.
Proceedings of the 2015 Conference on Empirical
Methods in Natural Language Processing, pages S. Liu, Z. Zhu, N. Ye, S. Guadarrama, and K. Mur-
360–368. phy. 2017. Improved Image Captioning via Policy
Kavita Ganesan, ChengXiang Zhai, and Jiawei Han. Gradient optimization of SPIDEr. In International
2010. Opinosis: a graph-based approach to abstrac- Conference on Computer Vision.
tive summarization of highly redundant opinions. In Yishu Miao and Phil Blunsom. 2016. Language as a
Proceedings of the 23rd international conference on latent variable: Discrete generative models for sen-
computational linguistics, pages 340–348. ACL. tence compression. In Proceedings of the 2016 Con-
Shima Gerani, Yashar Mehdad, Giuseppe Carenini, ference on Empirical Methods in Natural Language
Raymond T Ng, and Bita Nejat. 2014. Abstractive Processing.
summarization of product reviews using discourse Ramesh Nallapati, Bowen Zhou, Çağlar Gu̇lçehre,
structure. In Proceedings of the 2014 Conference on Bing Xiang, et al. 2016. Abstractive text summa-
Empirical Methods in Natural Language Process- rization using sequence-to-sequence rnns and be-
ing, volume 14, pages 1602–1613. yond. In Computational Natural Language Learn-
George Giannakopoulos. 2009. Automatic summariza- ing.
tion from multiple documents. Ph. D. dissertation. Mohammad Norouzi, Samy Bengio, Navdeep Jaitly,
Jiatao Gu, Baotian Hu, Zhengdong Lu, Hang Li, and Mike Schuster, Yonghui Wu, Dale Schuurmans,
Victor OK Li. 2016a. Guided sequence-to-sequence et al. 2016. Reward augmented maximum likeli-
learning with external rule memory. In International hood for neural structured prediction. In Advances
Conference on Learning Representations. In Neural Information Processing Systems, pages
1723–1731.
Jiatao Gu, Zhengdong Lu, Hang Li, and Victor OK
Li. 2016b. Incorporating copying mechanism in Ramakanth Pasunuru and Mohit Bansal. 2018. Multi-
sequence-to-sequence learning. In Proceedings of reward reinforced summarization with saliency and
the 54th Annual Meeting of the Association for Com- entailment. In Proceedings of the 16th Annual Con-
putational Linguistics. ference of the North American Chapter of the Asso-
ciation for Computational Linguistics: Human Lan-
Caglar Gulcehre, Sungjin Ahn, Ramesh Nallapati, guage Technologies.
Bowen Zhou, and Yoshua Bengio. 2016. Pointing
the unknown words. In Proceedings of the 54th An- Romain Paulus, Caiming Xiong, and Richard Socher.
nual Meeting of the Association for Computational 2018. A deep reinforced model for abstractive sum-
Linguistics. marization. In International Conference on Learn-
ing Representation.
Stefan Henß, Margot Mieskes, and Iryna Gurevych.
2015. A reinforcement learning approach for adap- Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and
tive single-and multi-document summarization. In Percy Liang. 2016. Squad: 100,000+ questions for
GSCL, pages 3–12. machine comprehension of text. In Proceedings of
the 2016 Conference on Empirical Methods in Nat-
Karl Moritz Hermann, Tomas Kocisky, Edward ural Language Processing.
Grefenstette, Lasse Espeholt, Will Kay, Mustafa Su-
leyman, and Phil Blunsom. 2015. Teaching ma- Marc’Aurelio Ranzato, Sumit Chopra, Michael Auli,
chines to read and comprehend. In Advances in Neu- and Wojciech Zaremba. 2016. Sequence level train-
ral Information Processing Systems, pages 1693– ing with recurrent neural networks. In International
1701. Conference on Learning Representations.
Steven J Rennie, Etienne Marcheret, Youssef Mroueh,
Jarret Ross, and Vaibhava Goel. 2016. Self-critical
sequence training for image captioning. In Com-
puter Vision and Pattern Recognition (CVPR).
Alexander M Rush, Sumit Chopra, and Jason Weston.
2015. A neural attention model for abstractive sen-
tence summarization. In Empirical Methods in Nat-
ural Language Processing.
Abigail See, Peter J. Liu, and Christopher D. Manning.
2017. Get to the point: Summarization with pointer-
generator networks. In Proceedings of the 55th An-
nual Meeting of the Association for Computational
Linguistics. Association for Computational Linguis-
tics.
Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014.
Sequence to sequence learning with neural net-
works. In Advances in neural information process-
ing systems, pages 3104–3112.
Sho Takase, Jun Suzuki, Naoaki Okazaki, Tsutomu
Hirao, and Masaaki Nagata. 2016. Neural head-
line generation on abstract meaning representation.
In Proceedings of the 2016 Conference on Empiri-
cal Methods in Natural Language Processing, pages
1054–1059.
Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly.
2015. Pointer networks. In Advances in Neural In-
formation Processing Systems, pages 2692–2700.
Lu Wang, Hema Raghavan, Vittorio Castelli, Radu Flo-
rian, and Claire Cardie. 2013. A sentence com-
pression based framework to query-focused multi-
document summarization. In ACL.
Mingxuan Wang, Zhengdong Lu, Hang Li, and Qun
Liu. 2016. Memory-enhanced decoder for neural
machine translation. In Proceedings of the 54th An-
nual Meeting of the Association for Computational
Linguistics.
Ronald J Williams. 1992. Simple statistical gradient-
following algorithms for connectionist reinforce-
ment learning. Machine learning, 8(3-4):229–256.
Hao Xiong, Zhongjun He, Xiaoguang Hu, and Hua Wu.
2018. Multi-channel encoder for neural machine
translation. In AAAI.
David Zajic, Bonnie Dorr, and Richard Schwartz. 2004.
Bbn/umd at duc-2004: Topiary. In Proceedings
of the HLT-NAACL 2004 Document Understanding
Workshop, Boston, pages 112–119.
Wenyuan Zeng, Wenjie Luo, Sanja Fidler, and Raquel
Urtasun. 2016. Efficient summarization with
read-again and copy mechanism. arXiv preprint
arXiv:1611.03382.
Supplementary Material In typical Reinforcement Learning, an agent
with policy receives rewards at intermediate steps
A Coverage Mechanism while the discount factor is used to balance long-
term and short-term rewards. In our task, there is
See et al. (2017) apply coverage mechanism to the
no intermediate rewards, only a final reward at ter-
pointer-generator in order to alleviate repetition.
minal step T . Therefore, the value function of a
They maintain a coverage vector ct as the sum of
partial sequence ct = w1:t is the expected reward
attention distribution over all previous decoding
at the terminal step.
steps 1 : t − 1. This vector is incorporated in cal-
culating the attention distribution at current step t: V (w1:t |P) = Ewt+1:T [R(w1:t ; wt+1:T |P)] (6)
The objective of policy gradient is to maximize
t−1
X 0 the average value starting from the initial state:
ct = at
N
t0 =0 1 X
J(θ) = V (w0 |I) (7)
eti = v tanh(Wh hi + Ws st + Wc cti + battn )
T
N
n=1
(4)
where N is the total number of examples in train-
where Wh , Ws , Wc , battn are learnable parame- ing set. The gradient of V (w0 |P) is computed as
ters. They define the coverage loss and combine it below (Williams, 1992):
with the primary loss to form a new loss function,
which is used to fine-tune a converged pointer- XT X
generator model. Ew2:T [ ∇θ πθ (wt+1 |w1:t , P)
t=1 wt ∈V (8)
X
losstcov = min(ati , cti ) × Q(w1:t , wt+1 |P)]
i
T where Q(w1:t , wt |P) is the state-action value for
1X t
Ltotal = (− log Ppg (w|x1:t ) + λlosstcov ) a particular action wt+1 at state w1:t given source
T passage P, and should be calculated as follow:
t=1
(5)
Q(w1:t , wt+1 |P) = Ewt+2:T [R(w1:t+1 ; wt+2:T |P)]
B Reinforcement Learning (9)
Previous work (Liu et al., 2017) adopts
To overcome the exposure bias (Bengio et al., Monte Carlo Rollout to approximate this expec-
2015) between training and testing, previous tation. Here we simply use the terminal reward
works (Ranzato et al., 2016; Paulus et al., 2018) R(w1:T |P) as an estimation with large variance.
use reinforcement learning algorithms to directly To compensate for the variance, we use a baseline
optimize on metric scores for summarization mod- estimator that doesn’t change the validity of gra-
els. In this setting, the generation of discrete words dients (Williams, 1992). We further follow Paulus
in a sentence is a sequence of actions. The deci- et al. (2018) to use the self-critical policy gradient
sion to take what action is based on a Policy Net- training algorithm (Rennie et al., 2016; Williams,
work πθ , which outputs a distribution of all possi- 1992). For each iteration, we sample a summary
ble actions at that step. In our case, πθ is simply y s = w1:Ts
+1 , and greedily generate a summary
our summarization model. ŷ = ŵ1:T +1 by selecting the word with the high-
The process of generating a summary s given est probability at each step. Then these two sum-
the source passage P can be summarized as fol- maries are fed to a reward function r that evalu-
lows. At each time step t, we sample a discrete ac- ates their closeness to the ground-truth. We choose
tion wt ∈ V - word in vocabulary, based on distri- ROUGE-L scores as the reward function r as in
bution from policy πθ (P, st ), where st = w1:t−1 is previous work (Paulus et al., 2018). The RL loss
the sequence of actions sampled in previous steps. function is as follows:
When we reach the end of the sequence at terminal T
step T (end-of-sentence marker is sampled from 1X
LRL = (r(ŷ) − r(y s )) log πθ (wt+1
s s
|w1:t )
πθ ), we feed the entire sequence sT = w1:T into a T
t=1
reward function and get a reward R(w1:T |P). (10)
where the reward for the greedily-generated sum-
mary (r(ŷ)) acts as a baseline to reduce variance.
C Training Details
We keep most of hyper-parameters and settings
the same as in See et al. (2017). We use a bi-
directional LSTM of 400 steps for the encoder,
and a uni-directional LSTM of 100 steps for both
decoders. All of our encoder and decoder LSTMs
have hidden dimension of 256, and the word em-
bedding dimension is set to 128. Our pre-set
vocabulary has a total of 50k word tokens in-
cluding special tokens for start, end, and out-of-
vocabulary(OOV) signals. The embedding matrix
is learned from scratch and shared between the en-
coder and two decoders.
All of our teacher forcing models reported are
trained with Adagrad (Duchi et al., 2011) with
learning rate of 0.15 and an initial accumulator
value of 0.1. The gradients are clipped to a max-
imum norm of 2.0. The batch size is set to 16.
Our model with closed-book decoder converged in
about 200,000 to 240,000 iterations and achieved
the best result on the validation set in another
2k~3k iterations with coverage loss added. We re-
store the best checkpoints (pre-coverage and post-
coverage) and apply policy gradient (RL). For
this phase of training, we choose Adam optimizer
(Kingma and Ba, 2015) because of its time effi-
ciency, and the learning rate is set to 0.000001.
The RL-XE mixed-loss ratio (λ) is set to 0.9984.
D Examples
We provide more example summaries generated
by our 2-decoder and pointer-generator baseline
(see Fig. 5, Fig. 6, and Fig. 7 on the next page).
Original Text (truncated): lionel messi should be fit to play in barcelona's
la liga game at celta vigo on sunday despite a scare over a possible foot
injury , centre back gerard pique said on wednesday . messi , the top
scorer in la liga , did not feature in either of argentina 's friendlies during
the international break after sustaining a blow to his right foot in last
month 's ` clasico ' against real madrid . ' i am optimistic about messi ,
i have spoken to him , ' pique told reporters at a promotional event .
lionel messi -lrb- right -rrb- should be available for barcelona 's trip to
celta vigo , according to gerard pique . ' i think that he can play at
balaidos -lrb- celta 's stadium -rrb- , ' added the spain international , who
came through barca 's youth academy with messi . ` in the end it is up
to how he feels during the rest of the week . the medical staff are those
who should decide . my feeling is that he will play . barca 's 2-1 win at
home to real stretched their lead over their arch rivals at the top of
la liga to four points with 10 games left . second-placed real , who host
granada on sunday , have stuttered in recent weeks and the 2-1 defeat at
the nou camp was their third loss in their last four outings in all
competitions .
Reference summary:
lionel messi didn't feature in either of Argentina's recent friendlies .
messi suffered a foot injury in barcelona's win over real madrid last month .
barca sits four points clear of real in la liga with 10 games remaining.
Pointer-Gen baseline:
lionel messi should be available for barcelona‘s trip to celta vigo.
the spain international did not feature in either of argentina‘s friendlies during
the international break.
messi, the top scorer in la liga, did not feature in either of argentinal‘s friendlies.
Figure 5: The pointer-generator repeats itself (italic) and makes a factual error (red), while the 2-decoder (pointer-
generator + closed-book decoder) generates the summary that recovers the salient information (highlighted) in the
original text.
Original Text (truncated): a waitress has revealed how the new zealand
prime minister had repeatedly given her unwanted attention while she was
working at a cafe in auckland frequented by him and his wife . published
on the daily blog on wednesday , the anonymous woman has recounted
how john key kept playfully pulling her hair despite being told to stop
during election time last year . however mr key defended his pranks as
' a bit of banter ' and said he had already apologised for his actions ,
stuff.co.nz reports . a waitress has revealed how the new zealand prime
minister had repeatedly given her unwanted attention while she was
working at a cafe in auckland frequented by him and his wife bronagh
( pictured together ) . the waitress had reportedly been working at a cafe
called rosie ( pictured ) in parnell , east of auckland . the waitress -
believed to be working at a cafe called rosie in parnell , east of auckland
- wrote about she how made it very clear that she was unimpressed by
mr key 's gestures . ' he was like the school yard bully tugging on the
little girls ' hair trying to get a reaction , experiencing that feeling of
power over her , ' she wrote on the blog . mr key kept being persistent
with his hair-pulling antics , despite being told by his wife bronagh to
stop . after dealing with the practical jokes over the six months he had
visited the cafe , the waitress finally lost her cool ...
Reference summary:
amanda bailey , 26 , says she does n't regret going public with her story .
the waitress revealed in a blog how john key kept pulling her hair .
she wrote that she gained unwanted attention from him last year at a cafe .
ms bailey said mr key kept touching her hair despite being told to stop .
owners say they were disappointed she never told them of her concerns .
they further stated mr key is popular among the cafe staff .
the prime minister defended his actions , saying he had already apologised .
he also said his pranks were ` all in the context of a bit of banter '
the waitress was working at a cafe called rosie in parnell , east of auckland .
Pointer-Generator baseline:
waitress was working at a cafe in auckland frequented by him and his wife .
she was working at a cafe called rosie in parnell , east of auckland .
mr key defended his pranks as ' a bit of banter ' and said he had already
apologised .
Pointer-Generator + closed-book decoder:
waitress has revealed how john key kept playfully pulling her hair despite being
told to stop during election time last year .
however mr key defended his pranks as ' a bit of banter ' and said he had already
apologised for his actions , stuff.co.nz reports .
Figure 6: The pointer-generator fails to address the most salient information from the original text, only men-
tioned a few unimportant points (where the waitress works), while the 2-decoder (pointer-generator + closed-book
decoder) generates the summary that recovers the salient information (highlighted) in the original text.
Original Text (truncated): the suicides of five young sailors who served
on the same base over two years has unearthed a shocking culture of ice
taking , binge drinking , bullying and depression within the australian
navy . the sailors were stationed or had been stationed at the west
australian port of hmas stirling off the coast of rockingham , south of
perth . their families did not learn of their previous attempts to take their
own lives and their drug use until after their deaths , according to abc 's
7.30 program . scroll down for video . stuart addison was serving on
hmas stirling off the coast of western australia when he took his own
life . five of the sailors who committed suicide had been serving with
the australian navy on hmas stirling ...
Reference summary:
five sailors took their own lives while serving on wa 's hmas stirling .
suicides happened over two years and some had attempted it before .
stuart addison 's family did n't know about his other attempts until his death .
it was a similar case for four other families , including stuart 's close friends .
revelations of ice use , binge drinking and depression have also emerged .
Pointer-Gen baseline:
stuart addison was serving on hmas stirling off the coast of western australia .
he was serving on hmas stirling off the coast of rockingham , south of perth .
their families did not learn of their previous attempts to take their own lives
and their drug use until after their deaths .
their families did not learn of their previous attempts to take their own lives
and their drug use until after their deaths .
Pointer-Gen + closed-book decoder:
the suicides of five young sailors who served on the same base over two years
has unearthed a shocking culture of ice taking , binge drinking , bullying and
depression within the australian navy .
the sailors were stationed at the west australian port of hmas stirling off the
coast of rockingham , south of perth .
their families did not learn of their previous attempts to take their own lives
and their drug use until after their deaths , according to abc 's 7.30 program .
Figure 7: The pointer-generator (non-coverage) repeats itself (italic), while the 2-decoder (pointer-generator +
closed-book decoder) generates the summary that recovers the salient information (highlighted) in the original text
as well as the reference summary.