Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Online Newspaper Analysis System

Hai Pham Van, Han Nguyen Nam, Huy Nguyen Quang, Kien Do Luong, Thanh
Nguyen Van, Tung Nguyen Duy,

Instructor Professor Pham Van Hai

Hanoi University of Science and Technology, Vietnam

ABSTRACT
Online newspapers are experiencing exponential growth in the digital age which
leads to the proliferation of fake news and information bias. Addressing these
challenges requires robust solutions leveraging advancements in Natural Language
Processing (NLP). In this paper, we present a comprehensive pipeline for detecting
fake news and performing Named Entity Recognition (NER) in online articles.
Our methodology integrates cutting-edge NLP techniques, including Bidirectional
Encoder Representations from Transformers (BERT), Transformer architecture,
BiLSTM, and Conditional Random Fields (CRF). We utilize BERT, a pre-trained
model, for understanding article semantics, enhancing accuracy in downstream KEYWORDS
tasks. For fake news detection, we employ BERT in a classification architecture, Fake news detection,
leveraging both headline and content information to significantly improve prediction sentiment analysis,
accuracy. Bidirectional Long Short-Term Memory (BiLSTM) is utilized for Named BERT, LSTM
Entity Recognition due to its bidirectional sequence modeling capabilities, identifying Transformer, NLP
potential entities efficiently. Additionally, we enhance NER predictions by
incorporating a CRF model for structured prediction tasks. We also use Scrapy as
a data crawling and preprocessing tool to extract features from online news articles,
enriching our dataset and enhancing model performance in prediction and labeling
tasks. The outcomes of our architecture are tested with contemporary datasets namely:
the Phobert Named Entity Reconigtion (PhoNER) datasets of Covid-19 and Kaggle
datasets relating to fake news.

I. Introduction pieces with non-essential content, sensational headlines, or


content that diverges from the article’s actual substance.
We are living in an era of industrialization and This inadvertently leads to readers being misled or
modernization. Alongside scientific and technological receiving inaccurate information. The recent advancement
advancements, communication systems and media of Natural Languge Processing modelsl, data mining and
platforms are rapidly developing, providing easier access to processing methodologies, datahave also provided
information for people. Information and news are truly important foundation and enable researchers to analyze
booming with a massive volume. and extract information in large amount of text at ease
This paper aims to delve into the realm of online news
analysis, exploring various methodologies, techniques, and
challenges associated with analyzing news content
disseminated through digital platforms. By examining the
intersection of technology, media, and society, we seek to
uncover insights into how online news shapes public
opinion, influences decision-making processes, and reflects
societal trends and biases.
Through a comprehensive review of existing literature
and case studies, this paper will provide a
multidimensional understanding of the complexities
involved in analyzing online news. From sentiment analysis
and topic modeling to the detection of misinformation and
Fig. 1. Amount of data being generated each minute (2019) understanding algorithmic biases, we will explore the
diverse array of analytical approaches employed to dissect
As a natural consequence, news websites and platforms digital news content.
always deliver a huge amount of articles to readers. To Furthermore, we will discuss the ethical considerations
attract readers, many news sites provide numerous news and implications of online news analysis, particularly
* Corresponding authors: concerning privacy, data security, and the spread of
misinformation. By critically assessing the strengths and
limitations of current methodologies, we aim to offer

1
insights into future directions for research and practical then temporarily stored by Kafka. Logstash will retrieve
applications in the field of online news analysis. data from Kafka, process it through several steps, and
A. General purpose and scope classify the data before adding it to Elasticsearch.
The primary objective of this paper is to undertake a b) Website system: The website allows users to analyze and
multifaceted analysis of news articles, employing statistical track issues of interest from the data in Elasticsearch. The
methodologies to discern their tonal attributes. Through backend utilizes Spring Boot to connect with both SQL and
this analysis, the paper aims to provide readers with Elasticsearch to retrieve data. The frontend is built using
recommendations for reputable news sources, thereby ReactJS.
facilitating informed consumption of information. Our result is a complete web application that enables
Additionally, the research endeavors to develop robust users to create keyword sets for analyzing and tracking
mechanisms for the detection of both genuine and data from news websites, as well as review articles collected
counterfeit news items, while also delving into the by the system.
identification of their respective sources. Furthermore, the
paper seeks to devise sophisticated keyword sets utilizing
diverse combination rules, thereby enabling precise and II. Related Works
efficient retrieval of articles that adhere to predetermined
criteria. This research is delimited to the development of a In order to perform statistical analysis, as well as to
web-based system designed to execute the aforementioned detect fake news by topic and category, the first main task is
tasks with efficacy and accuracy. Emphasis will be placed to determine what entities the news articles are targeting.
on the creation of a user-friendly interface that grants users This task is essentially named entity recognition.
access to a curated selection of news articles, vetted for
authenticity and credibility. The system, while not A. Named Entity Recognition Task (NER)
excessively intricate, will be endowed with a 1. Overview of NER
comprehensive suite of functionalities essential for the
fulfillment of its objectives. This encompasses, but is not In the current era of information explosion, useful
limited to, statistical algorithms for tone analysis, machine information extracting (IE) from data has become a
learning models for fake news detection, and sophisticated prerequisite activity in all existing fields.
search mechanisms based on predefined keyword sets. The According to the paper "A survey on Named Entity
resultant system is envisioned as a reliable and Recognition — datasets, tools, and methodologies" [1],
indispensable tool for individuals seeking to navigate the Name Entity Recognition or NER "is the process of
vast landscape of digital news media with discernment and identifying numerous segments of information referenced
confidence. in a text and then classifying them into pre-established
B. Data mining categories. Entities like “person”, “organization”, “region”,
The lack of training data for models is always an important and many more might be considered categories. Named
issue to address for any NLP projects. In our case , we Entity Recognition is a broad category of NLP issues
implemented Scrapy, a data crawling and processing tools known as sequence tagging.
which can automate the extraction of features of online NER is a subtask of IE, tasked with automatically
news articles from numerous online news forums and extracting named entities (such as person names, locations,
newspapers organizations, dates/times, quantities, etc.) by identifying
By implementing Scrapy, the crawled data consists of entities in a given text (Recognition) and determining their
articles collected directly from over 100 news websites daily types (Classification).
and aggregated for statistical analysis and keyword Through NER, we can build a knowledge base. It can
extraction. The data sources include articles from online organize and arrange information in a way that is useful for
news websites on the internet such as:dantri.com.vn, humans and also provide useful inference for subsequent
vnexpress.vn, soha.vn, kenh14.vn and other news websites. NLP algorithms.
The basic data fields that need to be collected are: NER can be implemented with the help of tool sets such
as:
a) SpaCy: [2] This Python package is designed for
Field name Meaning high-level natural language processing (NLP) tasks and is
domain Domain name of the site open source. It enables the creation of programs that can
publishedTime Date on which the article effectively process and comprehend large volumes of text,
was posted making it well-suited for deployment in production
title Title of the article environments. With support for over 72 languages and 80
content Contents of the news pre-trained pipelines across nearly 24 languages, it
authorDisplayName Name of the author facilitates the development of systems for information
type The type of the news extraction and natural language understanding.
... ... Additionally, it promotes multitask learning using
advanced models like BERT, offering state-of-the-art
Table 1: Data fields to be collected processing speed. Furthermore, it allows integration of
custom models developed in TensorFlow, PyTorch, and
other libraries. For named entity recognition, it offers a
dedicated pipeline component called the entity recognizer.
C. General Architecture The NER module [3] is built on a model that makes use of
In terms of architecture, our team have built an integrated residual convolutional neural networks and Bloom
system consisting of 2 main modules: embeddings. There are models for the English language
a) Data collection and processing system: Data will be that were previously trained on the OntoNotes Release 5.0
collected from news websites on the internet using Scrapy, package, including tagged text taken from news articles,

2
blogs, and phone conversations. sentences. For example, sentences need to be separated
b) NLTK, recognized as a prominent Python platform for whenever punctuation or period marks are found. The
developing applications that leverage human language purpose of sentence segmentation is to delineate sentence
data, has gained prominence in the realm of natural boundaries.
language processing (NLP). As cited by Bird et al. Tokenization [17] [18]: is the process of dividing a string
(2009) [4], NLTK serves as a comprehensive toolkit of written language into its constituent words.
encompassing various text processing functionalities, Part-of-speech tagging [15] [19]: is the process of
including parsing, categorization, tagging, stemming, assigning a word in a text to its corresponding part of
tokenization, and semantic reasoning. Moreover, it offers speech tag based on the context and its definition. It
named entity recognition (NER) capabilities alongside a describes the characteristic structure of the vocabulary
vibrant community forum, fostering active discussions and terms in a sentence or text.
knowledge exchange. NLTK is commonly utilized in both Entity recognition [20]: is the process of identifying key
research endeavors and educational settings, serving as a elements from the text, and then classifying them into
valuable resource for training students in NLP techniques. predefined categories. This is a crucial process of NER as it
Notably, its NER module employs a Maximum Entropy fulfills the purpose of NER.
Classifier trained on the Automatic Content Extraction
corpus.
c) Pytorch, which was created by Facebook [5], and it was B. Overview of BiLSTM Architecture
another open-source deep-learning package designed only 1. LTSM
for Python use. It serves as a premier platform for both
industry and education purposes. Pytorch can be mostly LSTM, a famous variant of RNN, was proposed by
used for image recognition and language processing by Hochreiter and Schmidhuber in 1997 [21] as a solution to
making machine learning more scalable several AI models the aforementioned problem. The LSTM model consists of
can be built quickly and efficiently. three gate networks, "input gate", "forget gate", and "output
Some of the other tools offered for industry-based projects gate", which perform better than RNN.
are LingPipe, AllenNLP [6], IBM Watson [7], Intellexer [1],
ParallelDots [8], and Dandelion API [9]. Some applications
of NER include:
- Indexing documents and supporting search tools :
integrating NER into document indexing and search
tools enhances the efficiency, relevance, and
sophistication of information retrieval systems by
leveraging the semantic information encoded in text
documents [10] [11].
Fig. 3. Basic architecture of LSTM
- Customer support chatbots: By integrating NER,
chatbots could potentially ask more relevant and
contextually appropriate questions during LSTM operates in a three-step operation:
conversations with users. This could lead to more Step 1: The first step in LSTM is to decide which
engaging and satisfying interactions [12] and information to discard from the cell at that specific time.
integrating NER into chatbots could enable them to This is determined with the help of a sigmoid function. It
better understand and respond to user queries considers the previous state pht´1 q and the current input
involving named entities [13]. pxt q and computes the function.
- Opinion mining: Named Entity Recognition (NER) Step 2: There are two functions in the second step. The first
can aid in opinion mining by identifying and is the sigmoid function, and the second is the tanh
extracting entities mentioned in opinions and also function. The sigmoid function decides which values to let
linking them to find certain meaningful through (0 or 1). The tanh function assigns weights to the
relationships, which can provide valuable context passed values, determining their importance from -1 to 1.
and insights for sentiment analysis [14]. Step 3: The third step is to decide what the final output will
be. First, a sigmoid layer needs to be run to determine
2. Basic steps of NER which part of the cell state goes to the output. Then, the cell
state needs to be passed through the tanh function to push
the values from -1 to 1 and multiply it with the output of
the sigmoid gate.
On the other hand, accurately identifying proper nouns
in a piece of text depends not only on the preceding
information of the word being considered but also on the
information following it. However, a traditional LSTM
architecture with a single layer can only predict the label of
the current word based on the information obtained from
the preceding words.

2. BiLSTM

Fig. 2. Basic steps of NER BiLSTM [22] was created to address the aforementioned
weakness. A BiLSTM architecture typically contains 2
Sentence segmentation [15] [16]: is the process of individual LSTM networks used simultaneously and
dividing a string of written language into its component independently to model the input sequence in 2 directions:

3
from left to right (forward LSTM) and from right to left model. We apply it with the negative log-linear function of
(backward LSTM). the function above.
# +
m
ź
LpX, λ, yq “ ´ log k k
P py |X , λq (4)
k“1

m
ÿ
»
n
ÿ
fi

— 1
— ÿ ffi
log — exp λj fj pX m , i, yi´1
k
, yik qffi
ffi
“´
– ZpXm q j
fl
k“1 i“1
(5)
Fig. 4. BiLSTM = forward LSTM + backward LSTM Calculate the partial derivative with respect to lambda to
find the minimum value of the log function because finding
the argmin value will achieve the maximum value for the
However, the BiLSTM architecture still does not fully entire negative log-linear function.
exploit the contextual semantics of the text.
m
BLpX, y, λq 1 ÿ
“´ Fj py k , X k q
C. Overview of CRF Model Bλ m k“1
m
CRF [23] is a probabilistic model for structured ÿ
` ´1P py k |X k , λqFj py, X k q (6)
prediction tasks and has been applied successfully in many
k“1
fields such as computer vision, natural language
processing, bioinformatics, etc. In the CRF model, the where:
nodes containing input data and the nodes containing
n
output data are directly connected to each other, ÿ
Fj py, Xq “ Fj pX, i, yi´1 , yi q (7)
contrasting with the architecture of LSTM or BiLSTM i“1
where inputs and outputs are indirectly connected through
is the observed mean feature value,
memory cells. CRF can be used for named entity labeling
with input features being manually extracted features of a m
ÿ
word such as: capitalization of the first letter, all letters are P py k |X k , λqFj py, X k q (8)
capitalized, preceding word is capitalized, word under k“1

consideration, preceding word is under consideration, . . . is the expected feature value according to the model.
In the CRF algorithm, we aim to maximize the conditional We use partial derivatives as a step in the gradient
probability P py|xq - the probability of having output vector descent method. Gradient descent is an iterative
y, given input provided by random variable x; and we take optimization algorithm to update parameter values until
the sequence with the highest probability. convergence of lambda is found. The final gradient descent
update equation for the CRF model is:
ŷ “ argmaxy P py|xq (1)
« ff
In CRF, the input data is sequential, and we rely on the m
ÿ k k
m
ÿ k k k
previous context to make predictions about a data point. We λ“λ`α Fj py , X q ` P py |X , λqFj py, X q
use feature functions that have multiple input values: k“1 k“1
(9)
- The set of input vectors X. The CRF model used for entity labeling addresses the
- The position i of the data point we are predicting. drawback of label bias due to the independence of labels
- The label of the data point i ´ 1 in X. from each other in the Hidden Markov Model. CRF first
- The label of the data point i in X. identifies the necessary feature functions, initializes lambda
The feature function fj is defined as follows: weights for random values, and then applies the gradient
fj px, i, yi , yi´1 q. descent method iteratively until the parameter values
Each feature function is based on the label of the previous converge.
and current word, takes on values of 0 or 1. We assign a set However, a problem with linear chain CRF is that they
of weights (lambda values) to each function, the algorithm only capture the dependencies between labels in a forward
will learn: direction.

n
ÿ D. Overview of BERT Mechanism
1 ÿ
P py|X, λq “ exp λi fi pX, i, yi´1 , yi q (2) 1. Encoder and Decoder Process
ZpXq j
i“1
Computers cannot learn directly from raw data such as
where: images, text files, audio files, or video clips. Therefore, they
require a process of encoding information into numerical
n
ÿ form and decoding the numerical form into output results.
ZpXq “ řyPY λi fi pX, i, yi´1 , yi q (3) These are the two processes called encoder and decoder:
i“1
Encoder: It is the phase of transforming input into
is the normalized function. learning features capable of learning tasks. For Neural
To estimate the parameters lambda, we use Maximum Network models, the Encoder consists of hidden layers. For
Likelihood Estimation (MLE). It is a method in statistics for CNN models, the Encoder comprises a sequence of
estimating the parameter values of a CRF probability Convolutional + Maxpooling layers. For RNN models, the

4
Encoder process mainly involves Embedding layers and word representation vector. We can encode it as a [0, 1]
Recurrent Neural Network layers. position vector or use sine and cosine functions [24].

Decoder: The output of the encoder is the input to the 3. Attention mechanism
decoders. This phase aims to determine the probability
distribution from the learning features at the encoder, 3.1 Scale dot product attention
thereby identifying the output labels. The result can be a This is a self-attention mechanism [24] where each word
single label for classification models or a sequence of labels can adjust its weight for other words in the sentence so that
in chronological order for seq2seq model. the weight is larger for words closer to it and gradually
decreases for words further away.
After the word embedding step (passing through the
embedding layer), we have the input of the encoder and
decoder as a matrix X of size mxn, where m and n are the
length of the sentence and the dimension of a word
embedding vector, respectively.
The three matrices Wq , Wk , Wv are the parameters that
the model needs to train. After multiplying these matrices
with the input matrix X, we obtain the matrices Q, K, V
(corresponding to Query, Key, and Value matrices in the
figure). The Query and Key matrices are used to compute
the score distribution for word pairs. The Value matrix
utilizes the score distribution to calculate the probability
distribution output vector.
The input for calculating attention will include the matrix
Q (each of its rows is a query vector representing the input
words) and the matrix K (similarly to the matrix Q, each
row is a key vector representing the input words). These
two matrices Q, K are used to compute attention where
words in the sentence return attention to a specific word in
the sentence. The attention vector will be calculated based
on the weighted average of the value vectors in the matrix
Fig. 5. seq2seq model with encoding and decoding, for
V with attention weights (computed from Q, K).
machine translation task
The function of Attention is:

QK T
ˆ ˙
2. Transformer architecture AttentionpQ, K, V q “ sof tmax ? V (10)
dk
The Transformer architecture was first introduced in the
paper titled "Attention is All You Need" by Vaswani et Dividing by dk , which is the dimensionality of the key
al [24]. This paper was presented at the Conference on vector, aims to prevent overflow if the exponent is too large.
Neural Information Processing Systems (NeurIPS) in 2017
This architecture consists of two parts, an encoder on the 3.2 Multi-head attention
left and a decoder on the right. So after the process of scale dot product, we will obtain
Encoder: is a stacked combination of 6 defined layers. an attention matrix. The parameters that the model needs
Each layer consists of two sub-layers within it. The first to adjust are the matrices Wq , Wk , Wv . Each such process is
sub-layer is multi-head self-attention. The second sub-layer called a head of attention. When repeating this process
is simply a fully connected feed-forward layer. A note is multiple times (in the paper, it’s 3 heads), we will obtain
that we will use a residual connection at each sub-layer the Multi-head Attention process as the transformation
immediately after normalization layer. This architecture below. Each branch of the input is a head of attention. In
has a similar idea to the resnet network in CNN. The output this branch, we perform scaled dot production, and the
of each sub-layer is LayerNorm (x + SubLayer(x)) with a output is the attention matrices.
dimension of 512. After obtaining the three attention matrices at the output,
Decoder: The decoder is also a stacked combination of 6 we concatenate them along the columns to obtain the
layers. The architecture is similar to the sub-layers in the aggregated multi-head matrix with the same height as the
Encoder except for the addition of 1 sub-layer representing input matrix, according to the formula:
the attention distribution at the first position. This layer is
no different from the multi-head self-attention layer except M ultiHeadpQ, K, V q “ concatenatephead1 , . . . , headh qW0
that it is adjusted to avoid including future words in the where, headi “ AttentionpQi , Ki , Vi q
attention. At the i-th step of the decoder, we only know the Note that, to return an output with the same size as the
words at positions smaller than i, so adjusting ensures that input matrix, we just need to multiply it by the W0 matrix
attention is only applied to words smaller than position i. with the same width as the input matrix.
The residual mechanism is also applied similarly to the
Encoder.
4. Encoder and Decoder process in BERT mechanism
Note that there is always an additional step of adding
Positional Encoding to the inputs of the encoder and Thus, concluding the process 3, we have completed the
decoder to incorporate the time factor into the model, first sub-layer of the Transformer, which is the multi-head
thereby increasing accuracy. This involves adding the Attention layer. In the second sub-layer, we will pass
position encoding vector of the word in the sentence to the through fully connected connections and return the output

5
with the same size as the input. The purpose is to be able to
iterate these blocks N times.
After the backward propagation transformations, we
often lose information about the position of words.
Therefore, it is necessary to apply a residual connection to
update previous information into the output. This method
is similar to ResNet in CNN. To ensure stable training, we
will also apply another normalization layer immediately
after the addition operation.
If we repeat the above layer block 6 times and denote them
as one encoder. We can simplify the graph of the encoder
process as shown below.
The decoder process is completely similar to the encoder
except for:
The decoder process will generate tokens sequentially from
left to right at each time step.
At each block layer of the decoder, we need to add the final
matrix from the encoder as an input to the multi-head
attention. Fig. 7. BERT architecture for MLM task
Add a layer of Masked Multi-head Attention sub-layer at
the beginning of each block layer. This layer is no different
from the Multi-head Attention except that it does not In which the task of masked language model has an
consider the attention of future words. output size equal to the maximum length of the sentence,
and the task of predicting whether the next sentence is
present or not has an output size of 2. Then, the loss value
5. Overview of BERT for updating the parameters is the total loss of both tasks.
BERT [25] is trained simultaneously on two different
tasks: Masked Language Model and Next Sentence
Prediction [25].
The BERTbase model utilizes 12 Transformer encoder
blocks, while the BERTlarge model uses 24 Transformer
encoder blocks (See Figure 6).
E. Evaluation metrics

The table below is called the Confusion Matrix and


contains the definitions of True Positive (TP), False Positive
(FP), False Negative (FN), True Negative (TN). Confusion
matrix is also discussed in a paper called "Evaluation: From
precision, recall and F-measure to ROC, informedness,
markedness and correlation" by Powers (2011) [26]. This
provides an in-depth exploration of various evaluation
metrics used in classification tasks. The confusion matrix
provides a tabular representation of the actual versus
predicted class labels produced by a classifier. In the
context of the paper, the confusion matrix serves as the
foundation for calculating various evaluation metrics
Fig. 6. Encoder blocks in two BERT architectures discussed, including precision, recall, F-measure, ROC
curve, informedness, markedness, and correlation
The BERT architecture in the Masked Language Model coefficient. By organizing the true positive, false positive,
task uses Transformer encoder blocks, which learn to true negative, and false negative predictions into a matrix,
predict the missing words at the [MASK] positions in the the confusion matrix allows for the computation of these
input, constituting 15% of the words in the sentence (See metrics, which offer different perspectives on the
Figure 7). performance of a classifier.
In the next sentence prediction task, BERT uses a
Transformer model with two sentences as input and
predicts the probability True/False of whether the two
sentences are consecutive or not.
During training, 50% of correct sentence pairs are mixed
with 50% of randomly chosen sentence pairs to help BERT
improve the accuracy of predicting the next sentence.
The BERT model uses the Cross Entropy loss function:

output size
Fig. 8. Confusion matrix
ÿ
Loss “ ´ yi ¨ log ŷi (11)
i“1

6
1. Precision website provides interaction with the for the following
services: Search, Generating statistics, News following and
Precision is defined as the ratio of the number of True
other APIs for Administrators, User personal information
Positives to the total number of predicted Positive instances
and Account.
(i.e. take the number of correctly predicted labels and
In specific, APIs relating to user personal information will
divides it by the total number of predictions made for that
help manage the users history in accessing and following
label).
news. APIs for Admin and Account provide account
A high precision means a high accuracy of correctly
management, login/logout actions, authentication and
labeled instances.
permission. Services and APIs will communicate through
TP mutual database. In addition a news database will be
P recision “ (12) constructed through a news integrated module with a
TP ` FP
procedure containing the steps: retrieval, integration, and
2. Recall data cleaning.
In terms of deployment the system will be deployed on
Recall is defined as the ratio of the number of True personal server and locally hosted. The SQL database is
Positives to the total number of actual Positive instances deployed n PostgreSQL and other services are deployed on
(i.e. take the number of correctly predicted labels and Docker.
divides it by the total number of labels that are actually Actors accessing the components of the system from the
correct). frontend to APIs and database are always required to
A high recall means a high proportion of True Positives, authenticate and given the right permission
indicating a low rate of entities not labeled as such.

TP 3. Building the system


Recall “ (13)
TP ` FN The system’s use cases: See Figure 10 and Figure 11.

3. F1 score
In practice, adjusting the model to increase Recall beyond
a certain point may lead to a decrease in Precision, and vice
versa, attempting to increase Precision can reduce Recall.
In such cases, F1 score serves as a useful measure of
prediction success when dealing with imbalanced classes,
calculated based on Precision and Recall.
2 1 1
“ `
F1 P recision Recall
P recision ˚ Recall
ðñ F1 “ 2 ˆ (14)
P recision ` Recall

III. Proposed Models

A. Integrated System Model proposal


1. Overview of the system
Our team has identified 3 main actors and our system
will consist of subsystems which will provide support for
the functionality of the actors such as: Query and search for
news, news statistics, management functionality.
In addition, our system will also be integrated with several
AI subsystems regarding object detection, keyword
detection as well as fake news detection.
The actors will interact with the system through the web
interface.
We would also provide database relating to news, Users
database and database for training the model.

2. Architectural System Design


Our system propose three main modules which are: (See
Figure 9) Data Mining and Preprocessing Module, AI
Module: this provides services for detecting fake news and
keyword extraction, Website Module which provides other
services.
As for the description of our architectural system design,
the actors include Customers, Users, Administrators
interacting with the system through the web browsers,
while with integrated platforms, the frontend server of the

7
Fig. 9. Architectural System Design

Fig. 10. Use cases for the Users Actor

8
Fig. 11. Use cases for the Administrator Actor

Spiders: Spiders is a class written by the users, they are


B. Integrated architecture details responsible for necessary data parsing and create new
URLs to reload to scheduler through the engine. This is a
1. Data mining and processing module fundumental part of the Item Pipeline in which all the data
parsed by Spiders will be sent here. Item Pipeline has the
1.1 Data flow task of processing them and stored in the database.
Data in the form of articles after being crawled will be Middlewares
pushed into Kafka After that, these articles will be This pipeline also includes middlewares which are the
classified into two states: Positive and Negative. components in between the Engine and other components.
Within this project, we perform sentiment analysis based They have the purpose of helping the user customize,
on the set of laws. Data is processed through logstash with expand the processing capability for each components. E.g:
the input as the data modified and processed through After downloading an URL, if you want to provide tracking
Kafka’s filters. The output data will be pushed through and send information right at that moment , you can
elastic search. expand and modify the configuration.In this case Spider
The data is stored in elastic search and indexed by the date middlewares are used: They are the component in between
to make it convenient for data analysis since we regard date the Scrapy Engine and Downloader, they handle the input
here as the base time unit and also in order to reduce the response from Spiders and the output (new items and
amount of data stored in one particular index. URLs). All three important of our pipeline all have
important middlewares. For example, Downloader
Middlewares sit between the Engine and Downloader,
1.2 Data mining and Processing Component effectively handling the pushed requests from the Engine
and the response generated from Downloader. While
Scheduler middlewares sit in between the Engine and
1.2.1 Data Crawling Scheduler to handle requests from both sides.
In order to crawl data [27], we use the framework Scrapy Data flow
. Scrapy is a open source framework used to extract data As for the data flow, when beginning to crawl from a
from websites on the internet. It can be used for different website, the Engine will identify the domain name and find
purposes , from extracting data to monitoring and the location of a Spider and request the Spider the first
automation testing [28]. First of all, Scrapy Engine is URLs to crawl. After that, the Engine receive a list of the
responsible for controlling all the data flow from system first URLs from Spider and send this to the Scheduler to
components and activate each events when particular make arrangement. Next, the Engine requests the next list
actions occur. The second component is a scheduler which of URLs from Scheduler and send them to the Downloader
is similar to a queue, scheduler arrange the order of the (requests). Downloader receives the request and perform
URL that are downloaded. And finally a downloader downloading the pages. After it has finished downloading,
performs downloading the websites and provide them as it will create a response and send back to the Engine.
data to the engine. Another important concept are the Responses from Downloader will be pushed from the the

9
Engine to the Spiders to analyze. At Spiders, when received outlines the design, architecture, and features of
a response, they will begin to extract the information from Elasticsearch, emphasizing its distributed nature,
the reponse(title, content, author, data published,...) and scalability, and support for real-time search and analytics.
handle the urls that have potential to crawl and push back The paper highlights Elasticsearch’s RESTful API, which
to the Engine (requests). At this point, the Engine will allows users to interact with the search engine using HTTP
receive the results from Spiders regarding two main tasks: requests, and discusses its use cases in various domains,
pushing the parsed data to the Item Pipeline to process and including log analysis, full-text search, and business
saved into the database, pushing new URLS to the intelligence. Since its launch in 2010, Elasticsearch has
Scheduler and return to step 3. quickly become the most popular search tool and is widely
used for use cases related to log analysis, full-text search,
1.2.2 Kafka security information, business analytics, and operational
Apache Kafka [27] is a a distributed messaging system. information. Elasticsearch is considered a search engine
Kafka is developed and maintained by Apache, which is why and inherits from Apache Lucene. Elasticsearch essentially
Kafka (message broker) has the name of Apache Kafka. functions as a web server with fast search capabilities
Similar to other message brokers, it is developed following through the RESTful protocol. It possesses highly efficient
the model of public/subscibe. The party who public the data data analysis and statistical capabilities and runs on its own
is called producer and the party receiving the data divided server platform and communicates via RESTful, so it is not
by topics is called consumer. overly dependent on what client or existing system you
have written it in. Therefore, integrating it into your system
becomes easier, and you only need to send an HTTP
1.2.3 Logstash
request, and it will return the results. Elasticsearch is a also
Logstash [29] is an open source application, and belongs
distributed system with incredible scalability and is an
to the ELK Stack ecosystem, with a very important aim
open-source system
which consists of three stages in the log pipeline
corresponding to three modules:
• INPUT: Receive, collect raw event log data from various 2. AI Module
sources such as files, Redis, RabbitMQ, Beats, syslog, ...
• FILTER: After receiving the data, perform event log data 2.1 BERT Model
operations (such as add, delete, replace, ... log content) BERT (Bidirectional Encoder Representation from
according to the administrator’s configuration to rebuild Transformer) stands for a model that represents words
the log event data structure as desired. bidirectionally using Transformer techniques. BERT is
• OUTPUT: Finally, forward the event log data to other designed to pre-train word embeddings. One notable
services such as Elasticsearch for log reception, storage, feature of BERT is its ability to balance context in both left
or display. and right directions.
The attention mechanism of Transformer passes all
words in the sentence simultaneously into the model at
once without considering the direction of the sentence.
Therefore, Transformer is considered bidirectional training,
although in reality, it is more accurate to say it is
non-directional training. This feature allows the model to
learn the context of a word based on all surrounding words,
including both left and right words.
Fig. 12. HT LogStash Flow One unique aspect of BERT that previous embedding
models did not have is the ability for fine-tuning the
training results. We can add an output layer to the model
At the INPUT stage, Logstash will be configured to select
architecture to customize it for specific training tasks.
the form of receiving log events or fetching log data
remotely as needed. After obtaining the log events, the
INPUT step will write the event data to a centralized queue
in RAM or on disk.
Each pipeline worker thread will continue to retrieve a
batch of events from this queue to process FILTER to help
restructure the log data to be sent at the OUTPUT stage.
The number of events processed per batch and the number
of pipeline worker threads can be configured for optimal
tuning.
By default, Logstash uses an in-memory queue in RAM
between stages (input -> filter and filter -> output) as a
Fig. 13. Pre-train and Fine-tuning process of BERT
buffer to store event data before processing. If the Logstash
service program is stopped for some reason in the middle,
then the event data in the buffer will be lost. Currently there are many different versions of BERT. All
the versions are based on the change if Transformer
1.2.4 Elasticsearch architecture focusing on three parameters:
Elasticsearch [30] is a distributed search and analytics • L: the number of block sub-layers inside transformer
tool built on Apache Lucene. Elasticsearch was first • H: the size of the embedding vectors (aka hidden
introduced in the paper titled "ElasticSearch: A Distributed, size)
RESTful Search Engine" by Shay Banon [31]. This paper • A: the number of heads inside a multi-head layer,
was presented at the 2010 Berlin Buzzwords Conference. It each head will perform a self-attention mechanism

10
There are two main architecture with different names: pre-trained BERT model. This turns a piece of text into a
BERTbase (L = 12, H = 768, A = 12) which has 110 millions fixed-size vector that represents the semantic aspect of the
parameters in total and BERTlarge (L = 24, H = 1024, A = 16) document.
which has 340 millions parameters in total In BERT large Step 2: Keywords and phrases (n-grams) are extracted
architecture, the number of layers has been doubled, the from the same document using Bag of Words techniques
hidden size has been increased 1.33 times and the number (such as TFIDF Vectorizer or CountVectorizer). This is a
of heads in multi-head layers are also 1.33x times the BERT classical step to perform keyword extraction.
base.
BERT also take the classification token - CLS for short - as
the input and follwed by the word sequence. After that it
transports the input to the upper layers. Each layer applies
self-attention and passes the result through a feedforward
network, then it hands over to the next encoder layer. The
output of the model is a vector sized according to the
BERT’s dimension. If we want to extract a classifier from
this model, we can take the output corresponding to the
CLS classification token.
Fig. 16. Demonstration for step 1 and 2 of KeyBert

Step 3: After that, each keyword is then embedded into a


vector with fixed size using the same model that we used to
embed the article

Fig. 14. Input illustration for BERT

2.2 Fake news detection with BERT


As stated in section 4.2.1, the input of the BERT model is a
Fig. 17. Demonstration for step 3 of KeyBert
classification token, so we need to tokenize the values to use
as input for the model. For the BERT Classification model
architecture for classifying fake news, our group proposes Step 4: Now that the keywords and the document are
the following: represented in the same space, KeyBERT computes the
cosine similarity between the keyword embeddings and the
document embeddings. Then, the most similar keywords
(with the highest cosine similarity scores) are extracted.

Fig. 18. Demonstration for step 4 of KeyBert

Additionally, KeyBERT performs diversification of the


selected keywords. Without applying these techniques, the
model may select very similar words or phrases as
Fig. 15. BERT Classfication architecture for fake news keywords.
detection For example, if there is an article about fruits containing
various types of fruits and this technique is not applied, the
top three keywords could be "green apple," "big apple," and
With the BERT model in use, the input is headlines and "yellow apple." Instead, we may want the model to include
content of the news article and the output is the classification various types of fruits in the top keywords, even if one type
result over whether the news is fake or not. is more common in the article. There are two main metrics
we would like to discuss. Firstly, Max Sum Similarity
2.3 Keyword Extraction using KeyBert (MSS): With this method, a parameter top_n, representing
KeyBERT [32] is a minimalistic and easy-to-use keyword the top keywords, is set to a value (e.g. 20). Then, 2 x top_n
extraction technique that leverages BERT embeddings to keywords are extracted from the document. Pairwise
generate keywords and keyword phrases most similar to similarities are calculated between these keywords. Finally,
the document. this method extracts the most relevant and least similar
There are four steps that KeyBERT [33] has completed to keywords. The second metric is Maximal Marginal
make predictions. Relevance (MMR): This method is similar to the previous
Step 1: The input document is embedded using a one, but it adds a parameter to represent diversity. MMR

11
aims to minimize redundancy and maximize diversity in The model through two main steps. Firstly, word
text summarization tasks. It starts by selecting keywords embeddings of each word and character are fed into the
most similar to the document. Then, it iteratively selects BiLSTM model layer to extract useful information about the
new candidates that are similar to the document but semantics and morphology of the word and the context
dissimilar to the already chosen keywords. surrounding that word. Secondly, the CRF layer processes
this information as features to make predictions about NER
labels for each word. In addition to the information
received from the BiLSTM layer, CRF also relies on
information from previously predicted labels. For example,
if the previous label is B-LOCATION, it is highly likely that
the current word being considered will have the label
I-LOCATION.
The learning parameters of the BiLSTM+CRF architecture
include: the word embedding matrix, the weight matrix of
the BiLSTM layer, and the transition matrix of the CRF
layer. All of these parameters are updated during training
on labeled data through the Back-propagation algorithm
with Stochastic Gradient Descent (SGD).

3. Backend

To build backend, our group opted for Spring Boot [34].


Spring Boot is a module of Spring Framework, providing
feature RAD (Rapid Application Development). Spring
Boot is used to create independent applications based on
Spring. Spring Boot does not require XML configuration. It
Fig. 19. Keyword extraction using KeyBERT is a standard configuration for software, which helps
increase productivity for developers. Advantages of Spring
Boot are that they have features of Spring Framework and
With the KeyBert model in use, the input of the model is
can create independent applications run by java -jar (also
news headlines abd the output are keywords corresponding
for java web). Furthermore, Spring boot also possess
to the headlines.
directly embedded applications server (Tomcat, Jetty...) so
no need to deploy war file. They also have few
2.4 NER task with BiLSTM model combining CRF configuration, automatically configure whenever possible
The combined architecture of BiLSTM and CRF both (Reduce coding time, increase productivity) and do not
addresses the drawbacks and leverages the advantages of require XML config. Spring Boot also provide many
each model in the NER task. plugins and is an industry standard for Microservices
(Cloud support; reduce setup, config; support libraries...)

4. Frontend

Our team uses ReactJS to develop the frontend. ReactJS is


an open-source library developed by Facebook, released in
2013. It is a JavaScript library used to build interactions with
components on websites. One of the most notable features
of ReactJS is the ability to render data not only on the server
side but also on the client side.

Fig. 20. Character Embeddings


5. Docker

Our team utilizes Docker to deploy components in the


HT system. Each component is built into corresponding
containers.
Docker is a software platform that allows for fast
building, testing, and deployment of applications. Docker
packages software into standardized units called containers
that include everything the software needs to run,
including libraries, system tools, code, and runtime. By
using Docker, applications can be quickly deployed and
scaled into any environment, with the assurance that the
Fig. 21. BiLSTM + CRF architecture code will run.

12
IV. Experimental Results

A. Model Evaluation
1. Experiment environment
The team is currently training models with the dataset in
the Python environment using deep learning libraries such
as Scikit-learn, Tensorflow [35], and Pytorch. The team
runs the training process on local machines and Google
Colab with corresponding configurations:

Fig. 22. Local computer configuration

Fig. 23. GPU configuration used on Colab

2. Dataset
2.1 PhoNER_COVID_19
PhoNER_COVID19 [36] is a dataset for identifying
named entities related to COVID-19 in Vietnamese, built
and published by VinAI Research.
The data was collected from articles tagged with the
keywords "COVID-19" or "COVID" from reputable online
news sources in Vietnam (VnExpress, ZingNews, BaoMoi,
and ThanhNien) from February 2020 to August 2020.
Subsequently, the main text content of the articles was
segmented into sentences using the RDRSegmenter from
VnCoreNLP. Sentences related to COVID-19 patients were
selected using BM25Plus. Then, they manually filtered out
sentences that did not contain relevant information about
COVID-19 patients in Vietnam, resulting in 10,027 raw
sentences.
Next, the data was manually labeled with a clear process.
The labeled results were reviewed and could be amended if
necessary. Finally, from the 10,000 raw text sentences,
35,000 entities were obtained. The authors divided the
dataset into corresponding Train/Validation/Test sets with
a ratio of 5/2/3.
PhoNER_COVID_19 is published at:
https://github.com/VinAIResearch/PhoNER_COVID19

2.2 Real-world Fake news:


Originating from a community competition on Kaggle in
2018, "Build a system to identify unreliable news articles,"
the dataset was published, consisting of 2 files: train.csv and
test.csv.
The news data collected is a collection of fake and real
news articles propagated during the 2016 United States
Presidential Election campaign.
The training data (train.csv) consists of 5 fields: news id
(id), news title (title), news author (author), news content
(text), and the label of the news (label). It contains 10,540
real news articles and 10,260 fake news articles.

13
3. Evaluation Results

PATIENT-ID AGE GENDER DATE OCCUPATION NAME LOCATION ORGANIZATION SYMPTOM TRANPORTATION

Istm-syllable 96 97 95 98 66 86 90 87 83 92

lstm-word 97 98 95 99 69 88 88 84 83 93

bert-syllable 95 93 94 98 56 90 84 77 84 83

bert-word 97 95 94 99 75 90 86 80 85 85

Table 2: Precision score

PATIENT-ID AGE GENDER DATE OCCUPATION NAME LOCATION ORGANIZATION SYMPTOM TRANPORTATION

Istm-syllable 96 91 90 96 55 82 94 78 88 93

lstm-word 95 88 86 96 57 85 93 80 86 95

bert-syllable 95 80 85 98 60 93 86 71 87 82

bert-word 96 82 86 99 65 93 86 76 88 83

Table 3: Recall score

PATIENT-ID AGE GENDER DATE OCCUPATION NAME LOCATION ORGANIZATION SYMPTOM TRANPORTATION

Istm-syllable 96 92 92 96 60 84 92 82 85 92

lstm-word 96 92 90 97 62 86 90 81 84 94

bert-syllable 95 86 89 98 58 91 85 74 85 82

bert-word 97 88 90 99 70 91 86 78 86 84

Table 4: F1 score

In the above tables, lstm-syllable means applying


BiLSTM+CRF model for syllable-level dataset and
lstm-word indicates applying BiLSTM+CRF model for
word-level dataset.

14
Fig. 24. Comparing F1 score of BiLSTM + CRF model

Fig. 25. Accuracy comparison between our group’s model

15
3.1 NER task
It can be observed that the results of BiLSTM+CRF
applied at the word level are better than at the character
level for our team. However, the results of BiLSRM+CRF at
the word level from VinAI Research [36] [37] consistently
outperform ours, especially for the OCCUPATION (JOB)
label. Furthermore that predictions are far more accurate
with fields such as PATIENT_ID , AGE , GENDER and
achieve less correct guesses with fields like JOB.
These results demonstrate that (Figure 24) a more detailed
approach at the word level can potentially yield more
promising outcomes for our team in the future. Another
observation is the bert model at the word-level has also
shown good results, while syllable-level model are
out-performed in many categories such as OCCUPATION,
ORGANIZATION, LOCATION.

3.2 Fake news Detection task


Due to the lack of publicly available datasets for fake
news in Vietnamese, our team used fake news datasets in
English. The results (See Figure 25) show that the BERT
model our team used performs quite well since the
prediction results achieve an above 90% accuracy. However,
these are evaluation results on an English-language news
dataset, so it cannot be confirmed to have high accuracy as
on the Vietnamese-language dataset.

B. Experiment and Deployment


1. Deploy
Our teams have been able to deploy a website with
following interface (See Figure 26, Figure 27, Figure 28,
Figure 29):

Fig. 26. Login interface

16
Fig. 27. Dashboard interface containing all the dashboard in-
use of the user

Fig. 28. Home page after login

17
Fig. 29. Detailed charts in a dashboard

This is the link of the demo video : https://youtu.be/ Secondly, concerning the quality of AI modules: delve
cbWDSplfBHY deeper into the domain of news data, gain a better
understanding of the structural levels of articles, texts;
2. Remarks explore and execute data enrichment to enable the model to
The web demo has fully met the functional requirements learn more. Implement the construction of real and fake
of the tasks. news datasets in Vietnamese and proceed with training.
The web demo has a user-friendly and clear interface.
However, the functionality of detecting real/fake news is
not very easy to explain to users, meaning users do not fully References
understand why the system classifies a news article A from
website B as real/fake. [1] B. Jehangir, S. Radhakrishnan, R. Agarwal, “A survey
on named entity recognition—datasets, tools, and
methodologies,” Natural Language Processing Journal,
V. Conlusion vol. 3, p. 100017, 2023.

A. Conclusion [2] L. Tran, “Data scraping application with scrapy,” 2023.


The report has summarized the process that the team has
[3] S. Vychegzhanin, E. Kotelnikov, “Comparison of
explored and developed an integrated system solution for
named entity recognition tools applied to news
the news analysis problem. First of all, the team has
articles,” in 2019 Ivannikov Ispras Open Conference
researched integrated system solutions and accompanying
(ISPRAS), 2019, pp. 72–77, IEEE.
technologies. Moreover, the team has explored some new
deep learning models and integrated AI solutions into the
[4] S. Bird, E. Klein, E. Loper, Natural language processing
system to enhance its usability. And the results obtained by
with Python: analyzing text with the natural language
the team are quite satisfactory.
toolkit. " O’Reilly Media, Inc.", 2009.
However, due to various objective and subjective reasons,
the team still has some limitations and shortcomings in [5] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury,
implementing this topic: the results from AI solutions are G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga,
not easy to explain to users. et al., “Pytorch: An imperative style, high-performance
deep learning library,” Advances in neural information
B. Future Development processing systems, vol. 32, 2019.
In the future, our team can explore and implement the
following directions: [6] M. Gardner, J. Grus, M. Neumann, O. Tafjord, P. Dasigi,
Firstly, regarding integrated solutions, further explore N. Liu, M. Peters, M. Schmitz, L. Zettlemoyer,
other integration solutions such as chatbots, news “Allennlp: A deep semantic natural language
recommendation systems, to personalize user experiences. processing platform,” arXiv preprint arXiv:1803.07640,
2018.

18
[7] D. A. Ferrucci, “Introduction to “this is watson”,” IBM [22] M. Schuster, K. K. Paliwal, “Bidirectional recurrent
Journal of Research and Development, vol. 56, no. 3.4, neural networks,” IEEE transactions on Signal
pp. 1–1, 2012. Processing, vol. 45, no. 11, pp. 2673–2681, 1997.

[8] A. Jain, I. Aggarwal, A. Singh, “Paralleldots at [23] S. Zheng, S. Jayasumana, B. Romera-Paredes, V. Vineet,
semeval-2019 task 3: Domain adaptation with feature Z. Su, D. Du, C. Huang, P. H. Torr, “Conditional
embeddings for contextual emotion analysis,” in random fields as recurrent neural networks,” in
Proceedings of the 13th International Workshop on Semantic Proceedings of the IEEE international conference on
Evaluation, 2019, pp. 185–189. computer vision, 2015, pp. 1529–1537.

[9] A. Dandelion, “Dandelion api,” línea]. Disponible: [24] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit,
https://dandelion. eu.[Último acceso: 23 de Enero de 2016], L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin,
2021. “Attention is all you need,” Advances in neural
information processing systems, vol. 30, 2017.
[10] A. Brandsen, S. Verberne, K. Lambers, M. Wansleeben,
“Can bert dig it? named entity recognition for [25] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, “Bert:
information retrieval in the archaeology domain,” Pre-training of deep bidirectional transformers
Journal on Computing and Cultural Heritage (JOCCH), for language understanding,” arXiv preprint
vol. 15, no. 3, pp. 1–18, 2022. arXiv:1810.04805, 2018.

[11] A. CARAVALE, P. MOSCATI, N. DURAN-SILVA, [26] D. M. Powers, “Evaluation: from precision, recall
B. GRIMAU, B. RONDELLI, “Developing a and f-measure to roc, informedness, markedness and
digital archaeology classification system using correlation,” arXiv preprint arXiv:2010.16061, 2020.
natural language processing and machine learning
techniques.,” Archeologia e Calcolatori, vol. 34, no. 2, [27] V. Cothey, “Web-crawling reliability,” Journal of the
2023. American Society for Information Science and Technology,
vol. 55, no. 14, pp. 1228–1238, 2004.
[12] S. Reshmi, K. Balakrishnan, “Enhancing
inquisitiveness of chatbots through ner integration,” [28] J. Wang, Y. Guo, “Scrapy-based crawling and user-
in 2018 International Conference on Data Science and behavior characteristics analysis on taobao,” in 2012
Engineering (ICDSE), 2018, pp. 1–5, IEEE. International Conference on Cyber-Enabled Distributed
Computing and Knowledge Discovery, 2012, pp. 44–52,
[13] E. M. Kusumaningtyas, E. R. Laurentino, A. R. IEEE.
Barakbah, “Responsive chatbot using named entity
recognition and cosine similarity,” 2023. [29] J. Turnbull, The Logstash Book. James Turnbull, 2013.

[14] J. Shin, E. Jo, Y. Yoon, J. Jung, “A system for [30] M.-P. Scott-Boyer, P. Dufour, F. Belleau, R. Ongaro-
interviewing and collecting statements based on intent Carcy, C. Plessis, O. Périn, A. Droit, “Use of
classification and named entity recognition using elasticsearch-based business intelligence tools for
augmentation,” Applied Sciences, vol. 13, no. 20, integration and visualization of biological data,”
p. 11545, 2023. Briefings in Bioinformatics, vol. 24, no. 6, p. bbad348,
2023.
[15] C. Manning, H. Schutze, Foundations of statistical natural
language processing. MIT press, 1999. [31] A. ElasticSearch, “Distributed restful search engine,”
2012.
[16] B. Minixhofer, J. Pfeiffer, I. Vulić, “Where’s the
point? self-supervised multilingual punctuation- [32] https://maartengr.github.io/KeyBERT/.
agnostic sentence segmentation,” arXiv preprint
arXiv:2305.18893, 2023. [33] A. Besbes, “How to extract relevant
keywords with keybert.” [Online]. Available:
[17] D. Jurafsky, J. H. Martin, “Speech and language https://towardsdatascience.com/how-to-extract-
processing: An introduction to natural language relevant-keywords-with-keybert-6e7b3cf889ae.
processing, computational linguistics, and speech
recognition.” [34] M. Mythily, V. R. Kanakala, R. Nambiar, et al.,
“An extensive review of spring boot testing based
[18] B. Elov, S. M. Khamroeva, Z. Xusainova, “The pipeline on business requirements of the software,” in 2023
processing of nlp,” in E3S Web of Conferences, vol. 413, 4th International Conference on Smart Electronics and
2023, p. 03011, EDP Sciences. Communication (ICOSEC), 2023, pp. 1547–1553, IEEE.

[19] P. Gholami-Dastgerdi, M.-R. Feizi-Derakhshi, “Part of [35] S. Pattanayak, Pro deep learning with TensorFlow 2.0: A
speech tagging using part of speech sequence graph,” mathematical approach to advanced artificial intelligence in
Annals of Data Science, vol. 10, no. 5, pp. 1301–1328, Python. Springer, 2023.
2023.
[36] D. Q. Nguyen, A. T. Nguyen, “Phobert: Pre-trained
[20] D. Nadeau, S. Sekine, “A survey of named language models for vietnamese,” arXiv preprint
entity recognition and classification,” Lingvisticae arXiv:2003.00744, 2020.
Investigationes, vol. 30, no. 1, pp. 3–26, 2007.
[37] T. H. Truong, M. H. Dao, D. Q. Nguyen, “Covid-
[21] S. Hochreiter, J. Schmidhuber, “Long short-term 19 named entity recognition for vietnamese,” arXiv
memory,” Neural computation, vol. 9, no. 8, pp. 1735– preprint arXiv:2104.03879, 2021.
1780, 1997.

19

You might also like