Professional Documents
Culture Documents
Knowledge Enhanced Semantic Communication
Knowledge Enhanced Semantic Communication
Knowledge Enhanced Semantic Communication
Abstract—In recent years, with the rapid development of deep the DL-based transmitter and receiver, and have proven their
learning and natural language processing technologies, semantic superiority over traditional communication methods. However,
communication has become a topic of great interest in the field of the receiver is still lacking the comprehensive knowledge
communication. Although existing deep learning-based semantic
arXiv:2302.07727v2 [cs.CL] 15 Apr 2023
communication approaches have shown many advantages, they understanding and reasoning ability, and cannot make full use
still do not make sufficient use of prior knowledge. Moreover, of the implicit prior knowledge in complex sentences.
most existing semantic communication methods focus on the In order to improve the capability of knowledge under-
semantic encoding at the transmitter side, while we believe that standing and reasoning, some studies propose to introduce the
the semantic decoding capability of the receiver should also be knowledge graph (KG), which stores human knowledge with a
concerned. In this paper, we propose a knowledge enhanced
semantic communication framework in which the receiver can graph structure composed of entities and relationships [6], into
more actively utilize the facts in the knowledge base for semantic semantic communication. In KGs, each fact is abstracted into
reasoning and decoding, on the basis of only affecting the a triple in the form of (entity-relationship-entity). For example,
parameters rather than the structure of the neural networks Ref. [7] utilizes knowledge triples to represent the semantic
at the transmitter side. Specifically, we design a transformer- information and evaluates the importance of each triple by
based knowledge extractor to find relevant factual triples for
the received noisy signal. Extensive simulation results on the an attention policy gradient algorithm. Ref. [8] proposes a
WebNLG dataset demonstrate that the proposed receiver yields semantic communication framework by encoding texts into
superior performance on top of the knowledge graph enhanced KGs. Ref. [9] introduces a knowledge reasoning based seman-
decoding. tic communication system. In Ref. [10], a reliable semantic
Index Terms—Semantic communication, knowledge graph, communication system based on KG is proposed, which can
Transformer. adaptively adjust the transmitted triples according to channel
quality. In Ref. [11], the authors exploit the knowledge base
by leveraging a logic programming language. In Ref. [12],
I. I NTRODUCTION
the authors propose a semantic similarity-based approach to
Benefiting from the rapid development of deep learning automatically identify and extract the most common concepts
(DL) and natural language processing (NLP), semantic com- from the knowledge base.
munications emerge with a special emphasis on the successful Knowledge graphs have somewhat improved the capabil-
delivery of the semantics of a message, rather than the ity of semantic communication systems to handle common
conventional bit-level accuracy in traditional communication. knowledge. However, most existing works only consider opti-
There have been some interesting studies on semantic commu- mizing the transmitter while ignoring the receiver. Typically,
nication [1]–[5]. Among them, one of the popular paradigms their transmitters achieve the semantic encoding by capturing
belongs to the DL-based joint source-channel coding (JSCC). and embedding the factual triples from the sentences with
For example, Ref. [1] proposes a transformer-based semantic knowledge graphs. Nonetheless, it is a great challenge for a
communication system for text transmission. Ref. [2] intro- knowledge base to cover all the semantic information of a
duces a semantic communication system based on Universal sentence, and the information missing may be detrimental to
Transformer (UT) with an adaptive circulation mechanism. In the communication efficiency. For example, a sentence like
order to reduce the semantic transmission error, Ref. [3] ex- “She loves him” can’t be represented by any factual triples
ploits hybrid automatic repeat request (HARQ), while Ref. [4] in the knowledge base, but it might be also the vital element
introduces an adaptive bit rate control mechanism. Moreover, in a transmitted text, leading the failure unacceptable. Instead,
Ref. [5] proposes a masked autoencoder (MAE) based sys- knowledge graphs can only describe the semantics in those
tem to robustly combat the possible noise. Notably, a key simple declarative sentences. Therefore, sending messages that
assumption of these studies lies in that both transmitter and are only encoded by knowledge graph-based triples may cause
receiver share common knowledge. On top of this assumption, extra semantic loss.
the existing semantic communication methods jointly train Therefore, in order to address these issues, we propose a
novel receiver-side scheme for semantic communication based
B. Wang, R. Li and J. Zhu are with the College of Information Science
and Electronic Engineering, Zhejiang University, Hangzhou 310027, China on KG. Different from existing works that extract factual
(e-mail: {wangbingyan, lirongpeng, zhujh20}@zju.edu.cn). triples from the transmitter side as the semantic representa-
Z. Zhao and H. Zhang are with Zhejiang Lab, Hangzhou, China as well tions, we apply a knowledge extraction module at the receiver
as the College of Information Science and Electronic Engineering, Zhe-
jiang University, Hangzhou 310027, China (e-mail: zhaozf@zhejianglab.com, side as a semantic decoding assistant to avoid the injection
honggangzhang@zju.edu.cn). of extra semantic noise and enhance the model’s robustness
2
TABLE I
N OTATIONS USED IN THIS PAPER Transmitter Receiver
k = Kθ (ĥ), (5)
II. S YSTEM M ODEL AND P ROBLEM F ORMULATION
where Kθ (·) represents the knowledge extractor.
A semantic communication system generally encompasses a Then the knowledge enhanced semantic decoder Sγ−1 (·)
semantic encoder and decoder, which can be depicted in Fig. 1. leverages the channel decoded vector ĥ and the extracted
Without loss of generality, we denote the input sentence s = knowledge vector k to obtain the received message ŝ =
[s1 , s2 , ..., sN ] ∈ NN , where si represents the i-th word (i.e., [ŝ1 , ŝ2 , ..., ŝN ]
token) in the sentence. In particular, the transmitter consists
of two modules, that is, the semantic encoder and the channel ŝ = Sγ−1 (ĥ || k), (6)
encoder. The semantic encoder Sβ (·) extracts the semantic where Sγ−1 (·) stands for the knowledge enhanced semantic
information in the content and represents it as a vector h ∈ decoder, and || indicates a concatenation operator.
RN ×ds , where ds is the dimension of each semantic symbol.
The accuracy of semantic communication is determined by
Mathematically,
the semantic similarity between the sent and received contents.
h = Sβ (s), (1) In order to minimize the semantic errors between s and ŝ, the
loss function can take account of the cross entropy of the two
and then the channel encoder Cα (·) encodes h into symbols
vectors
that can be transmitted over the physical channel as N
X
Lmodel = − (q(si ) log p(ŝi )), (7)
x = Cα (h), (2) i=1
3
to utilize end-to-end DNNs to accomplish the whole communi- Encoding Classfication & Embedding Knowledge
cation process, as shown in Fig. 1. The semantic encoders and Embedding
0
Multi-Head Atteneion
decoders are typically based on transformers [13]. Meanwhile,
Channel Decoder
1 (<h>Batchoy,
Knowledge Base
Feed Forward
Add & Norm
Sigmoid
z (1) z( L)
Linear
y L L L <t>Beef)
an autoencoder implemented by fully connected layers. The 0 (<h>Batchoy,
<r>Country,
1
whole semantic communication process is then reformulated 0
<t>Philippines)
1: Require: models Sβ (·), Cα (·), Sγ−1 (·), Cδ−1 (·) and Kθ (·) Parameter value
2: Input: tokenized sentence s
Train dataset size 24,467
3: Output: decoded sentence ŝ Test dataset size 2,734
4: Transmitter: Weight parameter w 0.02
5: Semantic encoding: h ← (Sβ (s)) DNN Optimizer Adam
Batch size 32
6: Channel encoding: x ← (Cα (h)) Model dimension 128
7: Transmit x over the physical channel: y ← Hx + n Learning rate 10−4
8: Receiver: Channel vector dimension 16
The number of multi-heads 8
9: Channel decoding: ĥ ← Cδ−1 (y)
10: Knowledge extraction Kθ (·):
11: Compute the embedding representation z(L) B. Numerical Results
12: t ← sigmoid(z(L) Wt + bt ) Fig. 3 and Fig. 4 show the BLEU score and Sentence-Bert
13: Find the triples {mi } where t̂i ≥ 0.5 score of the transformer model with respect to the signal-to-
14: Knowledge embedding: k ← fk ({mi }) noise ratio (SNR), respectively. It can be observed that the
15: Semantic decoding: ŝ ← Sγ−1 (ĥ || k) assistance of the knowledge extractor could significantly con-
tribute to improving the performance. In particular, regardless
The training complexity of a knowledge extractor is of the channel type, the knowledge extractor can always bring
O(LN 2 · dk ), which is the same order as the transformer more than 5% improvement in BLEU under low SNR sce-
encoder. Notably, the knowledge extractor is not limited to the narios. For the Sentence-Bert score, the knowledge-enhanced
conventional transformer structure, but can also be applied to receiver also shows a similar performance improvement. This
different transformer variants, such as Universal Transformer demonstrates that the proposed scheme can improve the com-
(UT) [2]. With the self-attention mechanism, the extracted prehension of semantics at the receiver side. Fig. 5 and Fig. 6
factual triples can provide additional prior knowledge to the demonstrate the performance of the UT model under both the
semantic decoder and therefore improve the performance of BLEU and Sentence-Bert metric, and a similar performance
the decoder. Typically, the knowledge vector is concatenated improvement could also be observed.
to the received message, rather than being merged into the On the other hand, we test the performance of the knowl-
source signal as previous works suggested. This architecture edge extractor under different SNRs. As shown in Fig. 7, the
ensures that when the extractor is of little avail, it can still extractor model can obtain a recall rate of over 90%. However,
function as a standard encoder-decoder transformer structure, the received content may be polluted by noise, resulting in
while avoiding possible semantic losses introduced by the an increase in false positives and leading to a large gap
knowledge extraction procedure. Therefore, even if the knowl- between precision and recall. The number of encoder layers
edge extractor fails to find any relevant knowledge, the model in the knowledge extractor may also affect the performance
still performs comparably to the baseline. of the model. Therefore, we also implement the knowledge
extractor with different number of transformer encoder layers
and present the performance comparison in Fig. 8. It can
IV. N UMERICAL R ESULTS be observed that the 6-layer model performs slightly better
than the 3-layer model. However, the performance remains
A. Dataset and Parameter Settings almost unchanged when it further to 9 layers. Furthermore,
in addition to utilize a fixed model trained at certain SNR,
The dataset used in the numerical experiment is based on it is also possible to leverage several SNR-specific models,
WebNLG v3.0 [14], which consists of data-text pairs where each corresponding to a specific SNR. Table III demonstrates
the data is a set of triples extracted from DBpedia and the the performance comparison between 0dB-specific and fixed
text is the verbalization of these triples. In this numerical model. It can be observed that compared to the fixed model,
experiment, the weight parameter w is set to 0.02, while the the SNR-specific model could yield superior performance
learning rate is set to 10−4 . Moreover, we set the dimension improvements. As a comparison, we also implement a scheme
of the dense layer as 128 × 16, and adopt an 8-head attention that utilize knowledge extractor for semantic encoding at
in transformer layer. The detailed settings of the proposed the transmitter. Fig. 9 presents the corresponding simulation
system are shown in Table II. We train the models based on results, and it can be observed that this transmitter-based
both the conventional transformer and UT [2]. Besides, we scheme is significantly inferior than the proposed scheme.
adopt two metrics to evaluate their performance, that is, 1-
gram Bilingual Evaluation Understudy (BLEU) [15] score for
measuring word-level accuracy and Sentence-Bert [16] score V. C ONCLUSION
for measuring semantic similarity. Notably, Sentence-Bert is a In this paper, we have proposed a knowledge graph en-
Siamese Bert-network model that generates fixed-length vector hanced semantic communication framework in which the
representations for sentences, while the Sentence-Bert score receiver can utilize prior knowledge from the knowledge
computes the cosine similarity of embedded vectors. graph for semantic decoding while requiring no additional
5
Fig. 3. The transformer model performance Fig. 4. The transformer model performance Fig. 5. The Universal Transformer model
of BLEU score with respect to SNR. of Sentence-Bert score with respect to SNR. performance of BLEU score with respect to SNR.
1.0 1.0
Sentence-Bert Similarity Score
0.9 0.9
0.8
Fig. 6. The Universal Transformer model Fig. 7. The performance of knowledge extractor. Fig. 8. The performance comparison under differ-
performance of Sentence-Bert score with respect ent numbers of encoder layers for extractor.
to SNR.
TABLE III
T HE PERFORMANCE COMPARISON BETWEEN A FIXED EXTRACTOR MODEL 2675, 2021.
AND SNR- SPECIFIC MODELS . [2] Q. Zhou, R. Li et al., “Semantic communication with adaptive universal
SNR/dB Fixed SNR-specific transformer,” IEEE Wireless Commun. Lett., vol. 11, no. 3, pp. 453–457,
2021.
-4 0.6514 0.6718 [3] P. Jiang, C.-K. Wen et al., “Deep source-channel coding for sentence
-2 0.7936 0.8126 semantic transmission with HARQ,” IEEE Trans. Commun., vol. 70,
0 0.8661 0.8661 no. 8, pp. 5225–5240, 2022.
2 0.9025 0.9134 [4] Q. Zhou, R. Li et al., “Adaptive bit rate control in semantic communica-
4 0.9164 0.9201 tion with incremental knowledge-based HARQ,” IEEE Open J. Commun.
Soc., vol. 3, pp. 1076–1089, 2022.
1.0 [5] Q. Hu, G. Zhang et al., “Robust semantic communications against
Sentence-Bert Similarity Score