Professional Documents
Culture Documents
11 - Open Domain Question Answering Based On Text Enhanced Knowledge Graph With Hyperedge Infusion
11 - Open Domain Question Answering Based On Text Enhanced Knowledge Graph With Hyperedge Infusion
Library
Science Cleary
Cleary master's degree major
degree major degree
master's gender
gender female
degree profession
profession
writer writer
1475
Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1475–1481
November 16 - 20, 2020.
2020
c Association for Computational Linguistics
as hyperedge connecting entities among text, and
Answer Prediction
then employ hypergraph convolutional networks
(HGCN) (Feng et al., 2019; Yadati et al., 2019) to
further update the entity states. Finally, the model
HGCN
predicts the final answers.
Our highlights are summarized as follows: 1) We
novelly treat documents as high-order relations (hy-
peredges) connecting entities mentioned in them.
2) We apply Hypergraph Convolutional Networks
to reason and propose the dual-step attention to
catch the importance of different entities and doc-
GCN
uments. 3) Extensive experiments conducted on
the widely used WebQuestionsSP (Yih et al., 2016)
with different KB settings demonstrate our model candidates documents
is effective. Input Encoder
2 Related Work KB query Text
The combination of knowledge base and text in
QA is a challenging task, which has attracted many Figure 2: The overview of the model. We utilize the se-
researchers’ attention. The work of (Das et al., mantic information mentioned in the text to enrich the
entity representation, and novelly treat text as hyper-
2017) extends universal schema to question an-
edges to complement the relation in incomplete KB.
swering and employs Key-Value Memory networks
to process to text and KB. Sun et al. (2018) re-
gard documents as heterogeneous nodes and com- Personalized PageRank (Haveliwala, 2002), where
bine them with entities in KB to form a uniform V is the entity set, E is the relation set, and T con-
graph. The model proposed by Xiong et al. (2019) tains a set of triples (vh , r, vt ) indicated there is
contains a graph-attention based KB reader and a a relation r ∈ E between vh ∈ V and vt ∈ V.
knowledge-aware text reader. Some other work Also a relevant text corpus D = d1 , d2 , ..., d|D|
focuses on retrieving a small graph that contains is retrieved from Wikipedia by an off-the-shelf
just the question-related information (Sun et al., document retriever (Chen et al., 2017), which
2019) and the interpretability of QA on KB and di = (w1 , w2 , ..., w|di | ) represents a document and
text (Sydorova et al., 2019). These methods lack the entities mentioned in documents have been
considering the high-order relationship among the linked. The task requires to extract answers from
entities contained in the text. This paper regards the all KB and document entities. The overview of our
text as hyperedge and further employs hypergraph model is shown in Figure 2.
convolutional networks.
Hypergraph convolutional networks (Feng et al., 3.2 Input Encoder
2019; Yadati et al., 2019) utilize hypergraph struc- Query and Text Encoder: Let Xq ∈ R|q|×n and
ture rather than a general graph to represent the Xd ∈ R|d|×n be the embedding matrices of query
high-order correlation among data entirely, and q and document d ∈ D, where n is the embed-
hypergraph attention (Bai et al., 2019) further en- ding dimension. Bi-LSTM networks (Hochreiter
hances the ability of representation learning by us- and Schmidhuber, 1997) are applied to encode the
ing an attention module. query and document separately and get the hidden
3 Model states Hq ∈ R|q|×h and Hd ∈ R|d|×h , h is the hid-
den dimension of bi-LSTM. Then we compute the
3.1 Task Definition representation of query hq and document hd with
To maintain consistency and fairness, we adopt attention mechanism.
the same setting as Sun et al. (2018) that builds
a subgraph for each question. Specifically, given hq = HT
q softmax(fq (Hq )) ∈ R
h×1
1476
where T represents matrix transposition, fq is a
linear network which converts h dimension to 1 αi = σ(hT (l1 )
q fa ([hvi ; hri ]))
dimension, and fd converts |q| dimension to 1 di-
where W1 ∈ Rh×h , W2 ∈ Rh×2h are learnable
mension.
parameters, Nv represents the adjacent triple set of
KB Encoder: Each entity v ∈ V is initialized entity v, fa converts 2h dimension to h dimension,
by pre-trained knowledge graph embedding xv ∈ l1 represents the current GCN layer, which has a
Rn×1 . And relation is initialized by semantic vec- total of L1 layers, and σ is the sigmoid function.
tor and KG embedding. Specifically, for relation
HGCN for Hypergraph-Formed Text: The
r ∈ E and its KG embedding xr ∈ Rn×1 , we to-
model regards plain text as hyperedges connect-
kenize it as r = (w1 , w2 , ..., w|r| ) and feed into
ing the entities among the text to complement the
bi-LSTM layer with word embedding to get the
lack of relations in KB. HGCN is employed to en-
hidden states Hr ∈ R|r|×h , then calculate the rep-
code on the hypergraph-formed text. What’s more,
resentation hr as follows.
dual-step attention catches the importance of dif-
|r|×h ferent entities and documents. Formally, at layer
Hrq = softmax(Hr HT
q )Hq ∈ R
l2 , the model first transfers the entity feature to
0 the connected hyperedges to form the document
Hr = [Hr ; Hrq ] ∈ R|r|×2h
representation,
0 0 0
hr = HrT softmax(fr1 (Hr )) ∈ R2h×1 0 (l +1) 0 (l ) X 0
0
hd 2 = W3 hd 2 + βi W4 hv(li 2 ) ∈ Rh×1
hr = fr2 ([hr ; xr ]) ∈ Rh×1 vi ∈Nd
prove the incomplete KB by enriching entity repre- hv(l2 +1) = W5 hv(l2 ) + γi W6 hdi 2 ∈ Rh×1
di ∈Dv
sentation and adding hyperedges, and applies GCN
and HGCN to reasoning. 0 (l +1)
γi = σ(hT
q hdi
2
)
GCN for Entity-Enriched KB: To utilize the
where W5 , W6 ∈ Rh×h are learnable parameters.
rich semantic information contained in the text,
we construct a binary matrix M, where Mvd ∈ 3.4 Answer Prediction
R|d|×1 indicates the span of entity v in document After L1 GCN layers and L2 HGCN layers, the
d, and pass information from documents to entities model finally predicts the probability of each entity
0
to form text-aware entity representation xv , then being the answer,
(0)
concatenate with xv as initial node state hv . 0
pv = σ(fout (hv(L2 ) ))
0
X
xv = HT v
d Md ∈ R
h×1
d∈Dv
where fout converts h dimension to 1 dimension.
0 4 Experiments
h(0)
v = fv ([xv ; xv ]) ∈ R
h×1
4.1 Dataset
where Dv is the linked documents set of entity
v, fv converts h + n dimension to h dimension. WebQuestionsSP (Yih et al., 2016) is a multi-
Then the model learns the entity representation by answer QA dataset which contains 4737 questions.
aggregating the connected entity feature. In our experiments we adopt the dataset 1 prepro-
cessed by Sun et al. (2018). Table 1 shows the
X 1
h(l
v
1 +1)
= W1 h(l
v
1)
+ αi W2 [h(l 1)
vi ; hri ] ∈ R
h×1 https://github.com/OceanskySun/
(vi ,ri )∈Nv
GraftNet
1477
questions avg candidates avg linked avg entities
Dataset avg answers
train / dev / test KB / Text / KB+Text documents in documents
WebQSP 2848 / 250 / 1639 384.6 / 141.6 / 515.1 43.6 11.2 4.6
Model KB-only Text-only KB+Text The GCN layer L1 and HGCN layer L2 are 1 and
KVMem 46.7 / 38.6 23.2 / 13.0 40.5 / 30.9 2 separately. The average runtime for one epoch is
GraftNet 66.7 / 62.4 25.3 / 15.3 67.8 / 60.4 5 minutes, and we set the max number of epochs to
SG-KA 66.5 / 58.0 -/- 67.2 / 57.3 200. The number of parameters is 69 million. The
PullNet 68.1 / - 24.8 / - -/- Adam optimizer (Kingma and Ba, 2015) is applied
Ours 66.9 / 60.1 27.2 / 17.1 68.4 / 60.6 to minimize the binary cross-entropy loss with a
learning rate of 0.0005. The threshold for F1 is set
Table 2: Hits@1 / F1 scores on WebQSP.
to 0.05.
4.4 Results
statistics of the dataset and retrieved subgraphs for
the questions, including KB and linked text. In par- Main Results: The metrics adopted in the exper-
ticular, the average number of linked entities in the iments are Hits@1, which is the accuracy of the top
documents is 4.6, which illustrates the rationality answer predicted by the model, and F1, which rep-
of adopting hyperedges. resents the ability to predict all answers. As shown
in table 2, we experiment with our model under KB-
4.2 Baseline Methods only, Text-only, and KB+Text settings and compare
We compare our methods with the following mod- them with baseline methods. Our model gets com-
els: petitive performance in the KB-only setting and
achieves the best results in the other two settings,
• KVMemNet (Miller et al., 2016) is an end-to- especially in the Text-only setting, Hits@1 and F1
end memory network which stores KB facts are 1.9% and 1.8% higher than the second-best
and text into key-value pairs. method respectively, which shows the validity of
• GraftNet (Sun et al., 2018) combines KB and treating documents as hyperedges. The promising
text with the early fusion strategy and applies performance may inspire us to handle similar tasks
a graph-based model. that build plain text to hypergraph and apply ef-
ficient HGCN. In KB+Text’s setting, our method
• SG-KA Reader (Xiong et al., 2019) proposes also achieves the best performance, proving that
two components to reason over KB and incor- our proposed enhancement strategy can effectively
porate entity information to text. enhance incomplete KB by fully introducing the
semantic and structural information implied in the
• PullNet (Sun et al., 2019) is a QA framework text. In particular, our model improves a lot com-
for learning how to retrieve small sub-graph pared with KB-only, more than the work of (Sun
related to answering the question. et al., 2018), which demonstrates our way that treat-
4.3 Training Details ing documents as hyperedges is more productive
than regarding them as heterogeneous nodes.
The model is implemented in PyTorch (Paszke
et al., 2019) and trained on one Nvidia Tesla P40 Different KB Setting: Following the work of
GPU. We apply 100-dimensional TransE embed- Sun et al. (2018) that the KB is downsampled to
dings (Bordes et al., 2013) for entities and relations, different extents, we experiment on 10%, 30%, and
and 300-dimensional GloVe embeddings (Penning- 50% KB settings, which represents the percentage
ton et al., 2014) for question and text words. The of required evidence covered by KB to simulate
word numbers of questions and documents are lim- the situation of incomplete KB, and analyze the im-
ited to be 10 and 50. The hidden size is set to 100. pact of the text on model performance. As shown
We select the hyperparameter values by manual in table 3, our model obtains the promising per-
tuning to perform the best results on the validation formance in the KB-only setting, especially the
dataset. The dropout is 0.2, and the batch size is 8. F1 metric all achieves the highest values, demon-
1478
10% 30% 50%
Model
KB-only KB+Text KB-only KB+Text KB-only KB+Text
KVMem 12.5 / 4.3 24.6 / 14.4 25.8 / 13.8 27.0 / 17.7 33.3 / 21.3 32.5 / 23.6
GraftNet 15.5 / 6.5 31.5 / 17.7 34.9 / 20.4 40.7 / 25.2 47.7 / 34.3 49.9 / 34.7
SG-KA 17.1 / 7.0 33.6 / 18.9 35.9 / 20.2 42.6 / 27.1 49.2 / 33.5 52.7 / 36.1
PullNet -/- -/- -/- -/- 50.3 / - 51.9 / -
Ours 18.3 / 7.9 33.7 / 19.9 35.2 / 21.0 42.8 / 27.5 49.3 / 34.3 52.8 / 37.1
10%KB+Text
. 9 0 H P Model
* U D I W 1 H W Hits@1 F1
6 * . $ Full Model 33.7 19.9
2 X U V
−GCN attention 33.3 19.3
+ L W V #
1479
References 2016 Conference on Empirical Methods in Natural
Language Processing, EMNLP 2016, Austin, Texas,
Song Bai, Feihu Zhang, and Philip H. S. Torr. 2019. USA, November 1-4, 2016, pages 1400–1409. The
Hypergraph convolution and hypergraph attention. Association for Computational Linguistics.
CoRR, abs/1901.08150.
Antoine Bordes, Nicolas Usunier, Sumit Chopra, and Adam Paszke, Sam Gross, Francisco Massa, Adam
Jason Weston. 2015. Large-scale simple ques- Lerer, James Bradbury, Gregory Chanan, Trevor
tion answering with memory networks. CoRR, Killeen, Zeming Lin, Natalia Gimelshein, Luca
abs/1506.02075. Antiga, Alban Desmaison, Andreas Köpf, Edward
Yang, Zachary DeVito, Martin Raison, Alykhan Te-
Antoine Bordes, Nicolas Usunier, Alberto Garcı́a- jani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang,
Durán, Jason Weston, and Oksana Yakhnenko. Junjie Bai, and Soumith Chintala. 2019. Pytorch:
2013. Translating embeddings for modeling multi- An imperative style, high-performance deep learn-
relational data. In Advances in Neural Information ing library. In Advances in Neural Information Pro-
Processing Systems 26: 27th Annual Conference on cessing Systems 32: Annual Conference on Neu-
Neural Information Processing Systems 2013. Pro- ral Information Processing Systems 2019, NeurIPS
ceedings of a meeting held December 5-8, 2013, 2019, 8-14 December 2019, Vancouver, BC, Canada,
Lake Tahoe, Nevada, United States, pages 2787– pages 8024–8035.
2795.
Jeffrey Pennington, Richard Socher, and Christopher D.
Danqi Chen, Adam Fisch, Jason Weston, and Antoine Manning. 2014. Glove: Global vectors for word
Bordes. 2017. Reading wikipedia to answer open- representation. In Proceedings of the 2014 Confer-
domain questions. In Proceedings of the 55th An- ence on Empirical Methods in Natural Language
nual Meeting of the Association for Computational Processing, EMNLP 2014, October 25-29, 2014,
Linguistics, ACL 2017, Vancouver, Canada, July 30 - Doha, Qatar, A meeting of SIGDAT, a Special Inter-
August 4, Volume 1: Long Papers, pages 1870–1879. est Group of the ACL, pages 1532–1543. ACL.
Association for Computational Linguistics.
Haitian Sun, Tania Bedrax-Weiss, and William W. Co-
Rajarshi Das, Manzil Zaheer, Siva Reddy, and Andrew hen. 2019. Pullnet: Open domain question answer-
McCallum. 2017. Question answering on knowl- ing with iterative retrieval on knowledge bases and
edge bases and text using universal schema and text. In Proceedings of the 2019 Conference on
memory networks. In Proceedings of the 55th An- Empirical Methods in Natural Language Processing
nual Meeting of the Association for Computational and the 9th International Joint Conference on Nat-
Linguistics, ACL 2017, Vancouver, Canada, July 30 ural Language Processing, EMNLP-IJCNLP 2019,
- August 4, Volume 2: Short Papers, pages 358–365. Hong Kong, China, November 3-7, 2019, pages
Association for Computational Linguistics. 2380–2390. Association for Computational Linguis-
tics.
Yifan Feng, Haoxuan You, Zizhao Zhang, Rongrong Ji,
and Yue Gao. 2019. Hypergraph neural networks. Haitian Sun, Bhuwan Dhingra, Manzil Zaheer, Kathryn
In The Thirty-Third AAAI Conference on Artificial Mazaitis, Ruslan Salakhutdinov, and William W. Co-
Intelligence, AAAI 2019, The Thirty-First Innova- hen. 2018. Open domain question answering using
tive Applications of Artificial Intelligence Confer- early fusion of knowledge bases and text. In Pro-
ence, IAAI 2019, The Ninth AAAI Symposium on Ed- ceedings of the 2018 Conference on Empirical Meth-
ucational Advances in Artificial Intelligence, EAAI ods in Natural Language Processing, Brussels, Bel-
2019, Honolulu, Hawaii, USA, January 27 - Febru- gium, October 31 - November 4, 2018, pages 4231–
ary 1, 2019, pages 3558–3565. AAAI Press. 4242. Association for Computational Linguistics.
Taher H. Haveliwala. 2002. Topic-sensitive pagerank. Alona Sydorova, Nina Pörner, and Benjamin Roth.
In Proceedings of the Eleventh International World 2019. Interpretable question answering on knowl-
Wide Web Conference, WWW 2002, May 7-11, 2002, edge bases and text. In Proceedings of the 57th Con-
Honolulu, Hawaii, USA, pages 517–526. ACM. ference of the Association for Computational Lin-
Sepp Hochreiter and Jürgen Schmidhuber. 1997. guistics, ACL 2019, Florence, Italy, July 28- August
Long short-term memory. Neural Computation, 2, 2019, Volume 1: Long Papers, pages 4943–4951.
9(8):1735–1780. Association for Computational Linguistics.
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Johannes Welbl, Pontus Stenetorp, and Sebastian
method for stochastic optimization. In 3rd Inter- Riedel. 2018. Constructing datasets for multi-hop
national Conference on Learning Representations, reading comprehension across documents. Trans.
ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Assoc. Comput. Linguistics, 6:287–302.
Conference Track Proceedings.
Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong
Alexander H. Miller, Adam Fisch, Jesse Dodge, Amir- Long, Chengqi Zhang, and Philip S. Yu. 2020. A
Hossein Karimi, Antoine Bordes, and Jason We- comprehensive survey on graph neural networks.
ston. 2016. Key-value memory networks for di- IEEE Transactions on Neural Networks and Learn-
rectly reading documents. In Proceedings of the ing Systems, pages 1–21.
1480
Wenhan Xiong, Mo Yu, Shiyu Chang, Xiaoxiao Guo,
and William Yang Wang. 2019. Improving ques-
tion answering over incomplete kbs with knowledge-
aware reader. In Proceedings of the 57th Confer-
ence of the Association for Computational Linguis-
tics, ACL 2019, Florence, Italy, July 28- August 2,
2019, Volume 1: Long Papers, pages 4258–4264. As-
sociation for Computational Linguistics.
Naganand Yadati, Madhav Nimishakavi, Prateek Ya-
dav, Vikram Nitin, Anand Louis, and Partha P. Taluk-
dar. 2019. Hypergcn: A new method for train-
ing graph convolutional networks on hypergraphs.
In Advances in Neural Information Processing Sys-
tems 32: Annual Conference on Neural Information
Processing Systems 2019, NeurIPS 2019, 8-14 De-
cember 2019, Vancouver, BC, Canada, pages 1509–
1520.
Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Ben-
gio, William W. Cohen, Ruslan Salakhutdinov, and
Christopher D. Manning. 2018. Hotpotqa: A dataset
for diverse, explainable multi-hop question answer-
ing. In Proceedings of the 2018 Conference on
Empirical Methods in Natural Language Process-
ing, Brussels, Belgium, October 31 - November 4,
2018, pages 2369–2380. Association for Computa-
tional Linguistics.
Wen-tau Yih, Matthew Richardson, Christopher Meek,
Ming-Wei Chang, and Jina Suh. 2016. The value of
semantic parse labeling for knowledge base question
answering. In Proceedings of the 54th Annual Meet-
ing of the Association for Computational Linguistics,
ACL 2016, August 7-12, 2016, Berlin, Germany, Vol-
ume 2: Short Papers. The Association for Computer
Linguistics.
1481