Professional Documents
Culture Documents
Transformer and Graph Convolutional Network For Text Classification
Transformer and Graph Convolutional Network For Text Classification
https://doi.org/10.1007/s44196-023-00337-z
RESEARCH ARTICLE
Abstract
Graph convolutional network (GCN) is an effective tool for feature clustering. However, in the text classification task, the
traditional TextGCN (GCN for Text Classification) ignores the context word order of the text. In addition, TextGCN constructs
the text graph only according to the context relationship, so it is difficult for the word nodes to learn an effective semantic
representation. Based on this, this paper proposes a text classification method that combines Transformer and GCN. To
improve the semantic accuracy of word node features, we add a part of speech (POS) to the word-document graph and build
edges between words based on POS. In the layer-to-layer of GCN, the Transformer is used to extract the contextual and
sequential information of the text. We conducted the experiment on five representative datasets. The results show that our
method can effectively improve the accuracy of text classification and is better than the comparison method.
1 Introduction all quality of life. Hence, the essence of this paper’s research
lies in refining existing text classification techniques, focus-
Text classification plays a pivotal role in natural language ing on improving the accuracy of text classification.
processing (NLP) [1]. It involves the automated categoriza- Existing text classification technologies are mainly based
tion of text through computer technology, finding extensive on machine learning and deep learning methods [2]. Com-
application in sentiment analysis, document classification, mon machine learning methods in text classification tasks
and public opinion analysis, among other domains. Elevat- include Support Vector Machine (SVM) [3], K-Nearest
ing the precision of text classification tasks holds the key to Neighbor (KNN) [4], and Random Forest (RF) [5], which
resolving pertinent real-world issues and enhancing the over- have achieved excellent performance in simple classification
tasks [5]. However, machine learning methods based on sta-
B Weili Guan tistical techniques have difficulty in achieving the desired
2113391032@st.gxu.edu.cn
performance on complex tasks in real life, such as medical
B Zhiheng Lu text diagnosis and sentiment analysis. With the breakthrough
8812316@163.com
development of word vector technology in deep learning,
Boting Liu words are given contextual semantics in the form of vec-
971328422@qq.com
tors [6]. Deep learning methods based on word vectors have
Changjin Yang achieved excellent NLP results and gradually become known
503141588@qq.com
as the mainstream method in the field of NLP [1]. In the
Zhijie Fang field of text classification, the most commonly used deep
nnfang@163.com
learning methods are Convolutional Neural Network (CNN)
1 School of Computer, Electronics and Information, Guangxi [7], Recurrent Neural Network (RNN) [8], transformer [9],
University, Nanning 530004, Guangxi, China and Graph Convolutional Network (GCN) [10]. Due to the
2 College of Digital Economics, Nanning University, Nanning limitation of CNN convolutional kernel size, CNN focuses
530299, Guangxi, China more on extracting local feature information of text. RNN
3 College of Electrical Engineering, Guangxi University of takes into account the role of each token in text, so RNN
Science and Technology, Liuzhou 545006, Guangxi, China focuses more on extracting global feature information of
4 School of Mechanical Engineering, Guangxi University, text, but there is a risk of gradient disappearance in RNN.
Nanning 530004, Guangxi, China
Transformer is a powerful feature selection tool that com- ument classification performance. TextGCN, a model based
bines attention mechanisms to obtain stronger contextual on semi-supervised learning, has enhanced the training dif-
relationships for text, and is an innovative technique in NLP. ficulty to a certain extent. This enables TextGCN to achieve
TextGCN achieves text classification by constructing a word- good fitting performance through only two layers of graph
document graph structure, and GCN focuses more on the convolution. Moreover, in TextGCN, documents and words
spatial feature information of the text. Also, in the study [10], form a heterogeneous graph structure, allowing for the learn-
GCN achieves better text classification accuracy than CNN ing of information at the word and document levels. Since
and RNN. Therefore, we believe that GCN has great potential then, more and more researchers have applied GCN to text
for text classification tasks. We will combine Transformer to classification [11].
improve GCN to obtain higher performance for text classifi-
cation.
Nodes in the GCN simultaneously assimilate informa- 2.2 Recent Works
tion from their neighboring nodes. However, this approach
implies that in a text classification task, document nodes The proposal [12] proposes a document-level GCN-based
consider all words within the document simultaneously, text classification method. Unlike TextGCN, the proposal
disregarding the text’s sequential order. Varied sentence [12] constructs each document as a separate graph. The
structures convey nuanced meanings, underscoring the sig- computational cost of GCN is optimized to achieve bet-
nificance of preserving text order. Consequently, we posit ter classification performance than TextGCN and to support
that enhancing GCN’s efficacy in text classification necessi- online classification of documents. A text classification
tates imbibing knowledge about text sequences. Moreover, method based on text graph tensor is proposed in the proposal
the scope of semantics attainable solely through contextual [13]. The proposal [13] uses three different compositions,
relationships in word-document graphs is inherently limited. semantic, syntactic, and sequential, to coordinate the infor-
Building upon this premise, our paper introduces a novel mation between different types of graphs and achieve a
text classification approach that amalgamates Transformer better classification performance than TextGCN. A text clas-
and GCN. This fusion capitalizes on the strengths of both sification method based on GCN with Bidirectional Long
models. The principal contributions of our study encompass Short-Term Memory (BiLSTM) is proposed in the proposal
the following aspects: [14], which is called IMGCN. The proposal [14] used Word-
net [15] with syntactic dependency composition method
• To tackle the issue of GCN overlooking textual order, and used BERT to get the embedding representation of
we seamlessly integrate the Transformer into the graph word nodes. Bidirectional LSTM with Attention was used
convolutional layers, forming what we refer to as a to further extract the contextual relationship of the text and
Graph Convolution Layer-Transformer-Graph Convo- combined with residual concatenation to get the classification
lution Layer (GTG). The Transformer enhances the results. A text classification method combining BiGRU and
contextualization of textual information, considering the GCN is proposed in the proposal [16]. The word embedding
crucial textual order aspect. The resultant Transformer representation is obtained by Word2vec [17], the contextual
output is amalgamated with GCN to yield a more precise information of the text is extracted by Bidirectional Gating
semantic representation of document nodes. Recurrent Unit (BiGRU) [18], and the spatial information of
• To address the issue of limited semantic information the text is extracted by the input GCN. A short text classifi-
in word node vectors within GCN, we suggest con- cation method based on GCN and BERT [19] was proposed
structing word-document graphs based on POS tagging. in the proposal [20]. A word-document-topic graph struc-
This approach imbues words with POS-related seman- ture was constructed using Biterm Topic Model (BTM) [21]
tics, thereby enhancing the overall semantic quality of to obtain the topics of documents. The word node features
word node vectors. after GCN iteration are fused with the word features out-
put from BERT and input to BiLSTM. BiLSTM will extract
the contextual semantics of the text and finally fuse with the
2 Related Work document node features to get the classification results. The
proposal [22] proposed a text classification model based on
2.1 TextGCN BERT with GCN. They initialize the node vector of GCN by
BERT and jointly train GCN and BERT to fully utilize the
In the early days, GCN was mainly applied to tasks with advantages of each model. A GCN text classification method
obvious spatial structure, such as social networks and knowl- based on inductive graphs was proposed in the proposal
edge graphs. In 2019, Yao et al. [10] applied GCN to text [23]. The original dataset was statistically summarized into
classification tasks for the first time and achieved good doc- small graphs, and good classification results were obtained
123
International Journal of Computational Intelligence Systems (2023) 16:161 Page 3 of 11 161
123
161 Page 4 of 11 International Journal of Computational Intelligence Systems (2023) 16:161
Fig. 2 Building graph based on POS. As shown in this figure, words of the same POS nature are linked by edges, allowing the words to obtain a
POS-based semantic representation
Fig. 3 Building graph based on context. Compute the relationships between words in the sliding window, so that the word nodes have context-based
semantic representations
GTG After constructing a word-document graph, the same window. The detailed processing procedure is illus-
graph node features undergo initial updates following the trated in Fig. 3.
application of the first graph convolution layer (GCL). Sub- After the processing illustrated in Fig. 3, we have success-
sequently, the word nodes are input into the Transformer to fully derived the word-to-word relationships. Subsequently,
extract contextual semantics, along with the text’s seman- we proceed to establish word-to-word edges based on con-
tic order information. Ultimately, the Transformer’s output textual information. The assignment of weights to these
is integrated with the document nodes to augment features, word-to-word edges is determined following Eqs. (1), (2),
forming the input for the second GCL. and (3)
Ni
p(i) = (1)
Nw
3.2 Data Preprocessing Ni j
p(i, j) = (2)
Nw
Our data pre-processing methodology closely follows the p(i, j)
approach outlined in [10], with a modification in the struc- P M I (i, j) = . (3)
p(i) p( j)
ture to enable the acquisition of POS-related information by
words. In Eqs. (1), (2), and (3), Nw represents the total num-
To begin, we segment the words within each document. ber of sliding windows, Ni corresponds to the frequency of
Subsequently, we employ the Natural Language Toolkit occurrence of term i across all sliding windows, and Ni j indi-
(NLTK) [24] for POS tagging of the words. Upon analyz- cates the co-occurrence frequency of terms i and j within the
ing each document, we establish connections between words same sliding windows. The Pointwise Mutual Information
sharing the same POS tag. All connections between words of (PMI) [25] is employed to quantify the relationship between
identical POS nature are assigned equal significance, result- these two terms, with higher PMI values indicating a stronger
ing in an edge weight of 1. A detailed illustration of this association between them. Therefore, a PMI greater than 0
processing procedure is presented in Fig. 2. signifies a substantial correlation between the two words,
Subsequently, we will establish word relationships grounded leading to the establishment of edges with assigned weights.
in context. Each document is scanned using a window of Next, we establish connections between documents and
length 20, and we capture the frequency of occurrences for words, treating each document as a node within the graph.
individual words within this window. Additionally, we tally Document nodes form connections with the words present
the frequency of adjacent word pairs appearing within the in the respective documents. The weights assigned to these
123
International Journal of Computational Intelligence Systems (2023) 16:161 Page 5 of 11 161
connections between documents and words are determined information within the text. The specifics of this approach are
using Term Frequency-Inverse Document Frequency (TF- illustrated in Fig. 5.
IDF) [26], with the corresponding formula presented in Eqs. In Fig. 5, the output of the Transformer is fused with the
(4), (5), and (6) document node vector, which is shown in Eq. (7)
123
161 Page 6 of 11 International Journal of Computational Intelligence Systems (2023) 16:161
123
International Journal of Computational Intelligence Systems (2023) 16:161 Page 7 of 11 161
GCN can be seamlessly employed to analyze graph data Fig. 8 The attention heat map of “Natural language processing is an
art”. The coordinate axes represent the word nodes, and the depth of
structures, effectively capturing spatial relationships among the matrix square color is positively correlated with the attention score
nodes and facilitating node classification [29]. Over the past of the word nodes
years, the potential of GCN in text classification has garnered
increasing attention from researchers, leading to its growing
adoption in various text classification tasks. comes, as illustrated in Eq. (16)
To facilitate efficient computations, GCN employs matrix
multiplication for all its operations, thereby representing and
processing graph structures as adjacency matrices. In the L (2) = ρ( ÂL (1) Wo ). (16)
initial graph convolutional layer (GCL), node updates are
determined by the Eqs. (14) and (15)
In Eq. (16), the feature input is refined as L (1) . Subse-
−1/2
 = D 1/2
AD (14) quently, an additional convolution operation is applied to L (1)
L (1) = ρ( ÂX Wo ), (15) to extract more intricate word-document spatial information.
This refinement aims to amplify the clustering impact of the
where A is the adjacency matrix of the graph, ρ is the Relu document nodes, ultimately contributing to the accomplish-
function, D is the degree matrix of A, X is the node feature, ment of the classification task. Consistent with the approach
and Wo is the weight matrix. proposed in [10], we maintain a node dimension of 300 in
After the initial GCL update, each node effectively assim- this study.
ilates information from its neighboring nodes, resulting in
nodes possessing specific spatial characteristics and exhibit-
ing a clustering effect. Subsequently, the textual nodes from
the initial layer are inputted into the Transformer to addition- 4 Experimental Results
ally extract contextual and sequential textual information.
The ensuing step involves dimensionality reduction through In this section, we will experimentally verify the effective-
the second GCL, yielding the ultimate classification out- ness and superiority of the method in this paper.
123
161 Page 8 of 11 International Journal of Computational Intelligence Systems (2023) 16:161
TP + TN 4.5 Results
Accuracy = (17)
TP + TN + FP + FN
TP In this section, we present the pertinent experimental find-
Precision = (18) ings along with a concise analysis of these results. The test
TP + FP
TP accuracies of each approach are displayed in Table 3. For
Recall = (19) the document classification task in this study, we evalu-
TP + FN
2 ∗ Precision ∗ Recall ated test accuracy and F1 score through ten iterations across
F1 = . (20) all models. The outcomes were reported as the mean value
Precision + Recall
accompanied by the standard deviation. In Tables 3 and 4,
the bolded results proved to be significantly better than the
True Positives (TP) is the number of positive classes pre- other methods in this dataset by t-test.
dicted; False Positives (FP) is the number of negative classes As indicated in Table 3, the proposed approach demon-
predicted to be positive classes; True Negatives (TN) is the strates optimal classification performance across three datasets.
number of negative classes predicted; False Negatives (FN) Specifically, the proposed method achieves a classification
refers to the number of positive classes predicted to be neg- accuracy of 86.96% on 20NG, 94.46% accuracy on R52, and
ative [30]. 69.72% accuracy on Ohsumed. In comparison, the proposed
123
International Journal of Computational Intelligence Systems (2023) 16:161 Page 9 of 11 161
SVM 83.54 ± 1.66 96.71 ± 0.12 93.07 ± 0.59 63.00 ± 1.26 75.44 ± 0.32
KNN 67.78 ± 1.32 88.03 ± 0.86 85.44 ± 0.52 56.32 ± 1.39 70.15 ± 0.22
RF 77.54 ± 2.56 94.88 ± 1.53 87.58 ± 1.16 58.32 ± 1.32 69.41 ± 0.25
BiLSTM 73.20 ± 0.56 96.35 ± 1.32 90.39 ± 0.69 49.56 ± 1.22 77.53 ± 0.29
BiGRU 73.61 ± 0.36 96.55 ± 1.12 91.12 ± 0.62 49.11 ± 1.19 76.95 ± 0.33
CNN 82.25 ± 0.28 95.61 ± 0.77 87.56 ± 0.86 58.64 ± 1.02 77.62 ± 0.66
Transformer 74.26 ± 0.86 96.47 ± 1.32 92.12 ± 1.12 52.31 ± 0.94 76.56 ± 0.65
FastText 79.52 ± 0.46 94.59 ± 0.88 90.86 ± 0.34 55.61 ± 0.36 76.31 ± 0.52
TextGCN [10] 86.26 ± 0.16 96.80 ± 0.13 93.61 ± 0.14 68.32 ± 0.29 76.00 ± 0.48
TextGCN(POS) 86.38 ± 0.11 97.02 ± 0.12 93.54 ± 0.16 68.47 ± 0.42 76.53 ± 0.44
BiGRU+GCN [16] 86.77 ± 0.14 97.06 ± 0.13 93.88 ± 0.18 68.44 ± 0.33 77.56 ± 0.46
BiLSTM+GCN [20] 86.55 ± 0.12 97.38 ± 0.16 94.20 ± 0.18 69.15 ± 0.36 78.24 ± 0.44
Ours 86.96 ± 0.09 97.22 ± 0.10 94.46 ± 0.08 69.72 ± 0.13 77.24 ± 0.30
The proposed method demonstrated a significantly superior performance compared to the baselines on datasets including 20NG, R52 and Ohsumed,
as determined by a student t test ( p < 0.05)
SVM 83.26 ± 1.26 89.12 ± 1.06 68.64 ± 0.05 62.66 ± 0.62 76.32 ± 0.04
KNN 66.59 ± 2.14 82.64 ± 2.26 66.32 ± 1.12 53.12 ± 1.11 70.21 ± 0.12
RF 77.14 ± 1.36 86.64 ± 1.53 65.36 ± 2.16 52.61 ± 1.06 69.26 ± 0.52
BiLSTM 73.65 ± 0.26 88.55 ± 1.41 69.36 ± 0.33 48.66 ± 0.22 77.26 ± 0.26
BiGRU 73.33 ± 0.38 88.62 ± 1.23 69.44 ± 0.62 48.99 ± 0.52 76.82 ± 0.13
CNN 82.06 ± 0.33 88.76 ± 0.63 69.55 ± 0.33 53.16 ± 0.62 77.60 ± 0.32
Transformer 74.88 ± 0.75 88.26 ± 0.86 68.88 ± 0.56 52.69 ± 0.41 75.96 ± 0.32
FastText 78.24 ± 0.36 90.64 ± 0.63 69.71 ± 0.14 54.88 ± 0.26 76.22 ± 0.46
TextGCN [10] 85.02 ± 0.06 92.88 ± 0.06 70.17 ± 0.07 61.45 ± 0.35 75.58 ± 0.34
TextGCN(POS) 85.12 ± 0.06 93.25 ± 0.12 70.42 ± 0.22 62.06 ± 0.12 76.49 ± 0.33
BiGRU+GCN [16] 85.45 ± 0.14 93.42 ± 0.26 70.66 ± 0.12 62.16 ± 0.13 77.52 ± 0.31
BiLSTM+GCN [20] 85.40 ± 0.10 94.33 ± 0.32 70.96 ± 0.05 62.32 ± 0.21 78.20 ± 0.31
Ours 85.69 ± 0.11 93.66 ± 0.47 71.22 ± 0.07 62.77 ± 0.43 77.02 ± 0.21
The proposed method demonstrated a significantly superior performance compared to the baselines on datasets including 20NG, R52, and Ohsumed,
as determined by Student’s t test ( p < 0.05)
method outperforms BiGRU+GCN and BiLSTM+GCN by However, on the MR and R8 datasets, our classification
0.19% and 0.41% on 20NG, surpasses BiGRU+GCN and performance lags behind the BiLSTM+GCN approach. This
BiLSTM+GCN by 0.58% and 0.26% on R52, and exceeds discrepancy suggests that the BTM within BiLSTM+GCN
BiGRU+GCN and BiLSTM+GCN by 1.28% and 0.57% on is more adept at capturing crucial information from shorter
Ohsumed. Furthermore, as detailed in Table 4, the proposed texts, revealing a limitation in our method’s performance with
method also attains the highest F1 scores on 20NG, R52, and concise texts. Despite this, our method outperforms Trans-
Ohsumed. These results underscore the superiority of the former and TextGCN across all datasets, showcasing how
method presented in this paper for text classification tasks. the GTG structure effectively amalgamates the strengths of
They also affirm that the Transformer exhibits more robust Transformer and GCN networks to enhance the model’s fea-
feature extraction capabilities than the RNN structure and ture extraction prowess.
achieves a more precise semantic representation of tokens. The inclusion of POS in TextGCN (POS) results in
performance enhancements across four datasets, as word
123
161 Page 10 of 11 International Journal of Computational Intelligence Systems (2023) 16:161
Fig. 9 The embedding of word nodes from the second GCL in max- Fig. 10 The embedding of word nodes from the second GCL in POS
imum value label. In the above figure, points with the same color label. In the above figure, points with the same color represent the same
represent the same document category POS tag
123
International Journal of Computational Intelligence Systems (2023) 16:161 Page 11 of 11 161
Open Access This article is licensed under a Creative Commons 16. Dong, Y., Yang, Z., Cao, H.: A text classification model based on
Attribution 4.0 International License, which permits use, sharing, adap- GCN and BiGRU fusion. In: Proceedings of the 8th International
tation, distribution and reproduction in any medium or format, as Conference on Computing and Artificial Intelligence, ACM, pp.
long as you give appropriate credit to the original author(s) and the 318–322 (2022)
source, provide a link to the Creative Commons licence, and indi- 17. Church, K.W.: Word2Vec. Nat. Lang. Eng. 23(1), 155–162 (2017)
cate if changes were made. The images or other third party material 18. Fang, F., Hu, X., Shu, J., et al.: Text classification model based on
in this article are included in the article’s Creative Commons licence, multi-head self-attention mechanism and BiGRU. In: 2021 IEEE
unless indicated otherwise in a credit line to the material. If material Conference on Telecommunications, Optics and Computer Science
is not included in the article’s Creative Commons licence and your (TOCS), pp. 357–361. IEEE (2021)
intended use is not permitted by statutory regulation or exceeds the 19. Devlin, J., Chang, M.W., Lee, K., et al.: Bert: Pre-training of
permitted use, you will need to obtain permission directly from the copy- deep bidirectional transformers for language understanding. arXiv
right holder. To view a copy of this licence, visit http://creativecomm preprint arXiv:1810.04805 (2018)
ons.org/licenses/by/4.0/. 20. Ye, Z., Jiang, G., Liu, Y., et al.: Document and word representations
generated by graph convolutional network and bert for short text
classification. In: ECAI 2020, IOS Press, pp. 2275–2281 (2020)
21. Huang, J., Peng, M., Li, P., et al.: Improving biterm topic model
References with word embeddings. World Wide Web 23(6), 3099–3124 (2020)
22. Lin, Y., Meng, Y., Sun, X., et al.: Bertgcn: transductive text classifi-
1. Kowsari, K., JafariMeimandi, K., Heidarysafa, M., et al.: Text clas- cation by combining gcn and bert. arXiv preprint arXiv:2105.05727
sification algorithms: a survey. Information 10(4), 150 (2019) (2021)
2. Mirończuk, M.M., Protasiewicz, J.: A recent overview of the state- 23. Wang, K., Han, S.C., Poon, J.: InducT-GCN: inductive graph con-
of-the-art elements of text classification. Expert Syst. Appl. 106, volutional networks for text classification. In: 2022 26th Interna-
36–54 (2018) tional Conference on Pattern Recognition (ICPR), pp. 1243–1249.
3. Goudjil, M., Koudil, M., Bedda, M., et al.: A novel active learning IEEE (2022)
method using SVM for text classification. Int. J. Autom. Comput. 24. Bird, S., Edward, L., et al.: Natural Language Processing with
15, 290–298 (2018) Python. O’Reilly Media Inc, Sebastopol (2009)
4. Trstenjak, B., Mikac, S., Donko, D.: KNN with TF-IDF based 25. Bouma, G.: Normalized (pointwise) mutual information in collo-
framework for text categorization. Procedia Eng. 69, 1356–1364 cation extraction. Proc. GSCL 30, 31–40 (2009)
(2014) 26. Ramos, J.: Using tf-idf to determine word relevance in document
5. Shah, K., Patel, H., Sanghvi, D., et al.: A comparative analysis of queries. Proc. First Instr. Conf. Mach. Learn. 242(1), 29–48 (2003)
logistic regression, random forest and KNN models for the text 27. Misra, D.M.: A self regularized non-monotonic activation function.
classification. Augment. Hum. Res. 5, 1–16 (2020) arXiv preprint arXiv:1908.08681 (2019)
6. Li, Y., Yang, T.: Word embedding for understanding natural lan- 28. Soyalp, G., Alar, A., Ozkanli, K., et al.: Improving Text Classifi-
guage: a survey. Guide Big Data Appl. 26, 83–104 (2018) cation with Transformer. In: 2021 6th International Conference on
7. Vieira, J.P.A., Moura, R.S., An analysis of convolutional neural Computer Science and Engineering (UBMK), pp. 707–712. IEEE
networks for sentence classification. In: XLIII Latin American (2021)
computer conference (CLEI), vol. 2017. IEEE, pp 1–5 (2017) 29. Zhang, S., Tong, H., Xu, J., et al.: Graph convolutional networks:
8. Liu, P., Qiu, X., Huang, X.: Recurrent neural network for text classi- a comprehensive review. Comput. Soc. Netw. 6(1), 1–23 (2019)
fication with multi-task learning. arXiv preprint arXiv:1605.05101 30. Feng, Y., Cheng, Y.: Short text sentiment analysis based on multi-
(2016) channel CNN with multi-head attention mechanism. IEEE Access
9. Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you 9, 19854–19863 (2021)
need. Adv. Neural Inf. Process. Syst. 30, 5988–5999 (2017) 31. Joulin, A., Grave, E., Bojanowski, P., et al.: Bag of tricks for effi-
10. Yao, L., Mao, C., Luo, Y.: Graph convolutional networks for text cient text classification. arXiv preprint arXiv:1607.01759 (2016)
classification. Proc. AAAI Conf. Artif. Intell. 33(01), 7370–7377 32. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors
(2019) for word representation. In: Proceedings of the 2014 Conference
11. Malekzadeh, M., Hajibabaee, P., Heidari, M., Review of graph neu- on Empirical Methods in Natural Language Processing (EMNLP),
ral network in text classification. In: IEEE 12th Annual Ubiquitous ACL, pp. 1532–1543 (2014)
Computing, Electronics and Mobile Communication Conference 33. Van der Maaten, L., Hinton, G.: Visualizing high-dimensional data
(UEMCON), 2021, pp, 0084–0091. IEEE (2021) using t-SNE. J. Mach. Learn. Res. 9(11), 2579–2605 (2018)
12. Huang, L., Ma, D., Li, S., et al.: Text level graph neural network
for text classification. arXiv preprint arXiv:1910.02356 (2019)
13. Liu, X., You, X., Zhang, X., et al.: Tensor graph convolutional
Publisher’s Note Springer Nature remains neutral with regard to juris-
networks for text classification. Proc. AAAI Conf. Artif. Intell.
dictional claims in published maps and institutional affiliations.
34(05), 8409–8416 (2020)
14. Xue, B., Zhu, C., Wang, X., et al.: The study on the text classi-
fication based on graph convolutional network and BiLSTM. In:
Proceedings of the 8th International Conference on Computing and
Artificial Intelligence, ACM, pp. 323–331(2022)
15. Fellbaum, C.: WordNet, Theory and Applications of Ontology:
Computer Applications, pp. 231–243. Springer, Dordrecht (2010)
123