Professional Documents
Culture Documents
Information Extraction On Tourism Domain Using SpaCy and BERT
Information Extraction On Tourism Domain Using SpaCy and BERT
1, April 2021
DOI: 10.37936/ecti-cit.2021151.228621
Received December 11, 2019; revised March 18, 2020; accepted June 18, 2020; available online January 5,
2020
• We demonstrate the use of two frameworks: BERT al. They built the vocabulary from a thesaurus ob-
and SpaCy. Both provide pre-trained models and tained from the United Nation World Tourism Organ-
provide basic libraries which can be adapted to serve isation (UNWTO). The classes as well as social plat-
the goals. form were defined. In [9], the approach for building
• A tourism domain is considered since our final goal e-tourism ontology is based on NLP and corpus pro-
is to construct the tourism ontology. Tourism data cessing which uses POS tagger and syntactic parser.
set is used to demonstrate the performance of both The named entity recognition method is Gazetter and
methods. The sample public tagged data is provided Transducer. Then the ontology is populated, and
as a guideline for future adaption. consistency checking is done using OWL2 reasoner.
It is known that extracting data into the ontology STI Innsbruck [10] presented the accommoda-
usually requires lots of human work. Several previous tion ontology. The ontology was expanded from
works have attempted to propose methods for build- GoodRelation vocabulary [11]. The ontology con-
ing ontology based on data extraction [3]. Most of tains the concepts about hotel rooms, hotels, camp-
the work relies on the web structure of documents [4- ing sites, types of accommodations and their features,
6]. The ontology is extracted based on HTML web and compound prices. The compound price shows the
structure, and the corpus is based on WordNet. The price breakdowns. For example, the prices show the
methodology can be used to extract required types of weekly cleaning fees or extra charges for electricity in
information from full texts for individuals insertion vacation homes based on metered usage. Chaves et al.
to the ontology or for other purposes such as tagging proposed Hontology which is a multilingual accom-
or annotation. modation ontology [12]. They divided it into 14 con-
When creating machine learning models, the diffi- cepts including facility, room type, transportation, lo-
cult part is the data annotation for training data set. cation, room price, etc. These concepts are similar to
For name entity recognition, the tedious task is to do QALL-ME [13], as shown in Table 1. Among all of
corpus cleansing and do the tagging. Text classifica- these, the typical properties are things such as name,
tion is easier because topics or types can be used as location, type of accommodation, facility, price, etc.
class labels. We will describe the overall methodology Location, facility, and restaurant are examples used
which starts from data set gathering by the web infor- in our paper for information extraction.
mation scraping process. The data is selected based
on the HTML tag for corpus building. The data is Table 1: Concept mapping between Hontology and
then used for model creation for automatic informa- QALL-ME [13].
tion extraction tasks. The preprocessing for training
Hontology QALL-ME
data annotation is discussed. Various tasks and tools
are needed for different processes. Then, the model Accommodation Accommodation
building is the next step where we demonstrate the Airport Airport
performance of two tools used in information extrac- BusStation BusStation
tion tasks, SpaCy and BERT. Country Country
Section 2 presents the backgrounds and previous Facility Facility
works. In Section 3, we describe the whole method-
Location Location
ology starting from data preparation for information
tagging until model building. Section 4 describes the MetroStation MetroStation
experimental results and discussion. Section 5 con- Price Price
cludes the work and discusses the future work. Restaurant Restaurant
RestaurantPrice GastroPrice
2. BACKGROUNDS RoomPrice RoomPrice
We give examples of the existing tourism ontology Stadium Stadium
to explore the example types of tourism information. Theatre Theatre
Since we focus on the use of machine learning to ex-
TrainStation TrainStation
tract relations from documents, we describe machine
learning and deep learning in natural language pro-
cessing in the literature review.
2. 2 Machine Learning in NLP
2. 1 Tourism ontology There are several NLP tasks including POS (part-
Many tourism ontologies have been proposed pre- of-speech), NER (named entity recognition), SBD
viously. For example, Mouhim et al. utilized the (sentence boundary disambiguation), word sense dis-
knowledge management approach for constructing an ambiguation, word segmentation, entity relationship
ontology [7]. They created a Morocco tourism on- identification, text summarization, text classification,
tology. The approach considered Mondeca tourism etc. Traditional methods for these tasks requires the
ontology in OnTour [8] was proposed by Siorpaes et lexical analysis, syntax analysis and a rule-base for
110 ECTI TRANSACTIONS ON COMPUTER AND INFORMATION TECHNOLOGY VOL.15, NO.1 April 2021
hand-coded relations between words. The rule-base to many tasks such as tagger, dependency parser, en-
must match the words within the varying window tity recognition, and text categirzation.
sizes, which varies a case by case basis. In the ma- The use of the Python library begins with import
chine learning approach, there is no need to hand- and nlp is used to extract linguistic features. The
code the rules. The requirement for lexical analysis first example is to extract part-of-speech. Examples
and syntax analysis is very small. The tagging (an- are shown in SpaCy web pages [17].
notation) of words is needed and some data cleansing
import spacy
is required. The tagging also implies the type of en-
tity (word/phrase). Tagging relationships may also nlp = spacy.load("en_core_web_sm")
be possible. The model will learn from these tags. doc = nlp("Apple is looking at buying
U.K. startup for $1 billion")
The accuracy of the model depends on the correct-
ness and completeness of the training data set, while for token in doc:
the rule-bases approach’s accuracy depends on the print(token.text, token.lemma_, token.pos_,
token.tag_, token.dep_,
correctness and completeness of the rules. token.shape_,
The rule-based approach is very fragile and sensi- token.is_alpha, token.is_stop)
tive to a language structure. Recently, we found sev-
eral works applying machine learning to NLP tasks The linguistic features are in the fields in the print
[14]. Typical machine learning is used to learn lan- statements: token.lemma , token.pos , token.tag , to-
guage features and build a model to solve these ken.dep , token.shape , token.is alpha, token.is stop
tasks. Common models are Naive Bayes, SVM, Ran- The part of speech is in token.pos . The tag is
dom Forest, Decision Tree, Markov model, statistical token.tag and the token dependency type is in to-
model, and deep learning model. ken.dep . In the example, apple is noun subject, and
it is a proper noun.
Apple Apple PROPN NNP nsubj Xxxxx True False
is be AUX VBZ aux xx True True
2. 3 SpaCy looking look VERB VBG ROOT xxxx True False
at at ADP IN prep xx True True
SpaCy [15] provides an open-source library and buying buy VERB VBG pcomp xxxx True False
neural network model to perform basic NLP tasks. U.K. U.K. PROPN NNP compound X.X. False False
startup startup NOUN NN dobj xxxx True False
The tasks include processing linguistic features such for for ADP IN prep xxx True True
as tokenization, part-of-speech tagging, rule-based $ $ SYM $ quantmod $ False False
matching, lemmatization, dependency parsing, sen- 1 1 NUM CD compound d False False
billion billion NUM CD pobj xxxx True False
tence boundary detection, named entity recognition,
similarity detection, etc. The model is stochastic and
the training process is done based on gradient and The next example extracts the name entity.
loss function. The pretrained model can be used in import spacy
transfer learning as well.
nlp = spacy.load("en_core_web_sm")
doc = nlp("Apple is looking at buying U.K.
startup for $1 billion")
for ent in doc.ents:
print(ent.text, ent.start_char,
ent.end_char, ent.label_)
2. 4 Deep Learning and BERT edge to smaller BiLSTM with fewer model parame-
ters. They experimented on data sets Reuters-21578,
In deep learning, the use of a deep network is for
AAPD, IMDB, and Yelp 2014.
the purpose of learning features. The common model
for this task is Recurrent Neural Network (RNN) [18]. NER is a basic task that is useful for many appli-
The RNN captures the previous contexts as states cations relevant to information searching. In order
and is used to predict the outputs (such as next pre- to find the documents related to the user search, the
dicted words). RNN can be structured many ways, semantic of the documents must be analyzed. The
such as a stack, a grid, as well as supporting bidi- meaning of phrases or words as well as their parts
rectional access to learn from left and right contexts. of speech must be known. Name entity recogniton is
The RNN cell can be implemented by LSTM (Long the task which can help achieve this goal. To discover
Short Termed Memory) or GRU (Gated Recurrent the named entity of a given type, lots of training data
Unit) to support the choice of forgetting or remem- is used. The training data must be annotated with
bering. proper tags. POS tagger is the first required one,
since part-of-speech is useful for discovering types of
One typical model is the Seq2Seq model which
entities. Other standards of tagging are IO, BIO,
applies bidirectional LSTM cells with the attention
BMEWO, and BMEWO+ [26]. These are used to
scheme [19]. The example application is abstractive
tag positions of tokens inside word chunks. The tag-
text summarization [20]. The extractive text summa-
ging process is a tedious task, so an automatic pro-
rization can be based on the text ranking approach
cess is needed. Machine learning can, therefore, be
(similar to page ranking) to select top rank sentences
applied to the named entity recognition to help auto-
[21]. The text summarization is an important task for
matic tagging [27]. Recent works in using BERT in
many NLP applications but the current approaches
the NER task [28-30] which consider BERT-NER in
still yield the results with low accuracy for real data
an open domain are HAREM I Portugese language,
sets. This task requires a large and specific training
and Chinese Medical Text respectively. In our work,
data set to increase accuracy.
we focus on a tourism data set. For NER task, the
Traditional RNNs have a major drawback since it
tags for the named entity types we look for have to
may not be suitable to apply the pre-trained weights.
be defined and we train the model to recognize them.
This is because the pre-trained model usually is ob-
Also, we can train the parser to look for the relation
tained from different domain contexts. Retraining
between our entity types. Then, we can extract the
it will make it soon forget the previous knowledge
desired relation along with the named entity.
and adapt itself to a new context. Recently, Google
proposed a pre-trained transformer, BERT, which is
bidirectional and can be applied to NLP tasks [1]. 3. METHODOLOGY
The concept of the transformer does not rely on
We started by collecting a tourism data set in
shifting the input to the left and to the right like
Thailand by area and performing annotation. Since
in Seq2Seq. The model construction also considers
we focus on finding the concept of location, organi-
sentence boundaries and relationships between sen-
zation, and facility in the tourism data set, the ap-
tences. The model derives the embedding represen-
proaches that we can apply are the following:
tation of the training data.
BERT has been used to perform many tasks such • Use the classification approach. We classify the sen-
as text summarization [22] where both extractive tences by the concept type, for example, location (0),
and abstractive approaches are explored. The exten- organization (1), facility (2). The classification may
sion is called BERTSUMEXt where CLS is inserted be obtained by examining the HTML tag when we do
multiple places to learn interval embedding between web scraping. The classification is also used as a tool
sentences. The data set is news which comes from for the summarization approach. The summary label
NYT,CNN/Daily Mail. In [23], BERT was used for is defined as a type of heading. It is also the concept
extractive summarization for health informatic lec- in this case. HTML tags can be used for annotation.
tures as a service. BERT embedding is used, and then • Use the NER approach. This requires more work
K-Means clustering identifies the closet sentences to on annotation. It also depends on the type of model
the summarry. Munikar et.al. applied BERT for fine- used for NER Task. In this approach, we are able to
grained sentiment text classification [24]. The au- find the boundaries of words that contain the concept.
thors used Stanford Sentiment Treebank (SST) data This is the most complicated one for data annotation
set where there are five labels: 0 (very negative), among them. We will discuss this more later.
1 (negative), 2 (neutral), 3 (positive), and 4 (very Figure 2 presents the overall methodology of
positive). They applied dropout and softmax layers preparing data for creating the models for the NER
on top of BERT layers. DocBERT is another ap- tasks. The difficult part is to label the data automat-
proach for document classification [25]. They adopted ically, which will also be useful for other similar tasks.
BERTbase and BERTlarge for fine tuning and then In Section 4, we demonstrate the accuracy results of
applied knowledge distrillation to transfer the knowl- the models after training.
112 ECTI TRANSACTIONS ON COMPUTER AND INFORMATION TECHNOLOGY VOL.15, NO.1 April 2021
Fig.3: Adding pipeline of noun phase chunking. Fig.4: Adding pipeline of TextRank.
# dependency type between pair Apartment is a good choice when you are visiting
}), Central Pattaya.”, the boundary of words for tag LO-
....
CATEAT is from position 0-10, which is “Stay at”.
In this above example, ‘heads’ is a list whose length Position 11-37, “@ Home Executive Apartment” is
is equal to the number of words. ‘deps’ is the list ORG. Position 77-92, “Central Pattaya” is LOC, etc.
of name relations. Each element refers to its par- The annotation is formated in JSON as:
ents. For example, 0 at the last element (water) refers ’text’: Staying at @ Home Executive
to the relation ‘FACILITY’ to ‘convenience’ and 6 Apartment is a good choice when
refers to the modifier ‘TYPE’ of ‘water’. ‘-’ means you are visiting
Central Pattaya.
no relation (or ignored). Obviously, it requires effort ’entities’: [(11,37,ORG), (77,92,LOC)]
for manual annotations. Figure 5 shows dependency
graph from SpaCy for this example. The dependency In the above example, there are two entities, ‘@
between “conveniences” and “desk”, as well as “wa- Home Executive ’ which begins at character index
ter” and “HAS FACILITY”. Figure 6 shows the ex- 11 and ends at index 37, and its type is ORG name
ample of relation in our raw corpus for the property (hotel name). ‘Central Pattaya’ begins at character
isLocated which is used in tourism ontology concept. index 77 and ends at index 92, and its type is LOC.
In Figure 7, there can be lots of relations in one
sentence, making it harder to annotate. The sen- 3. 4.4 BIO tagging
tence exhibits both isLocated and hasFacility prop- For BERT-NER task, BIO tagging is required.
erties. Figure 8 is the new pipeline following the BIO tagging is commonly used in NLP tasks [31],
NE recognition stage to train our relationship model. though other tagging approaches may be used. The
Only sentences with NEs are extracted, and the an- training data is adjusted as seen in Figure 9. We
notation of a relation dependency for each sentence transformed the input raw corpus from SpaCy to its
can be processed. form. The sentence is split into words and each word
is marked with POS tagger. Column ‘Tag’ is our tag
3. 4.3 Integrating all tags
name for each word. We have to split the multiword
To recognize both NEs and relation names at the named entity and tag each word since we have to use
same time, we added relation tags to the training BERT tokenizer to convert to word ID before train-
data. We created indication words which imply the ing. In the figure, our hotel name starts at ‘@’, so we
location, organization, facility, and relation name in use the tag B-ORG (beginning of ORG) and ‘Home’ is
the sentences. The sentences were split into selected the intermediate word after it (I-ORG). BERT needs
phrases based on the named entities, according to to convert the input sentences into lists of word IDs
the corpus, tagged as LOC, ORG, LOCATEAT, or along with the list of label IDs. The lists are padded
FACILITY. We recorded the starting index position, to equal size before sending them to train the model.
ending index position) for each word chunk found.
In the example, “Staying at @ Home Executive
Bimodal Emotion Recognition Using Deep Belief Network 115
4. 1.1 SpaCy
Three models were built using SpaCy. 1) the
model to recognize ORG/LOC, 2) the model to rec-
ognize FACILITY, and 3) the model to recognize all
ORG/LOC/FACILITY. For model 1), we used raw
corpus with annotations only for ORG/LOC, con-
taining 13,019 sentences. For 2), the raw corpus only
Fig.8: Relation creation. contains FACILITY sentences, with 13,522 rows. For
3), we combined the corpus data from 1) and 2) to
create one model recognizing all entities. Each model
was trained until the loss was stable. Here, we man-
ually select sentences for tagging.
The tagged data is divided into 70% training and
30% testing. Training for LOC/ORG had no diffi-
culty at all since ORG/LOG labels are already in the
pre-trained model of SpaCy. We added new words
and labels and retrained it. For FACILITY, we added
it as the new tag label and the model was trained
to learn the new tags. The loss was higher than the
LOC/ORG model. Also, when we combined the three
tags, the total loss was even higher. If we do not man-
ually select sentences containing these named entities,
Fig.9: BERT training data for NER
the large raw corpus takes a long time to train and
the loss is very high, over 1,000.
4. EXPERIMENTAL RESULTS
Table 3: Comparison between predicted and anno-
The experiments were run on Intel 3.2 GHz Core i5
tated NE using spaCy.
with 32 GB of RAM for training the models with 2 Ti- Type LOC/ORG FAC LOC/ORG/FAC
tan Xps. We split the results into subsections: NER, train test train test train test
and text classficiation for both SpaCy, and BERT for #Annotated 46,327 19,787 22,167 9,427 93,661 40,745
our tourism data set. All the code and data can be #Predicted 70,156 29,873 18,716 7,299 85,867 37387
Diff 23,829 10,086 -3,451 -2,128 -7,794 -3,358
downloaded at http://github.com/cchantra/nlp_
tourism for NER for all experimental code and data.
Table 3 shows the correctness of the three models
using the SpaCy approach. Row ‘Annotated’ shows
4. 1 Name Entitiy Recognition (NER)
the counts of the data labeled for training and ‘Pre-
Using the data gathered in Section 3. ,1 we consider dicted’ is the number of NEs predicted by the model.
the NER task using SpaCy and BERT. The perfor- ‘Diff’ is the difference between‘Annotated’ and ‘Pre-
mance is reported below. dicted’. Note there are negative values in ‘Diff’. This
is because the model discovers more entities than the
number of annotated labels. For LOC/ORG, SpaCy
has the built-in (pre-trained) labels for it. Thus, it
116 ECTI TRANSACTIONS ON COMPUTER AND INFORMATION TECHNOLOGY VOL.15, NO.1 April 2021
may identify more entities than our number of an- includes the token conversion to ID. It cannot tok-
notated labels. For ‘FACILITY’, the number of pre- enize our special multiword names which are proper
dicted labels missed is around 15% for training and nouns very well. For example, ‘Central Pattaya’ is
22% for testing. tokenized into ‘u’central’, u’pat’, u’##ta’, u’##ya’.
Note that SpaCy predicts the exact boundary of which causes our labels to have a wrong starting in-
the phrases. The prediction results may not ex- dex at position ‘pattaya’. The unknown proper nouns
actly match the annotation, but it provides the close are chopped into portions, making the label positions
boundary of the entity or good estimate. If we shift out. According to the paper’s suggestion, this
counted according to the number of exactly matched needs to be solved by using own Wordpiece tokenizer
positions, it will be mostly wrong. Thus, we loosen [32]. In Table 5, we created our own set of vocab-
the measurement to count only the number of pre- ularies for training based on the data we crawled.
dicted ones. For LOC/ORG/FAC, the number of The accuracy results are measured based on the num-
missing ones is 8% for training and 8% for testing. ber of correct predictions. LOC/ORG shows the re-
Consider the relation name together with the sults for location and name hotels. FAC shows the
above name entities. Figure 10 shows the re- results for facilities of each hotels and the last col-
sults of marking all NEs and relation names. umn LOC/ORG/FAC shows the results for all tags
Tag “USE” indicates “FACILITY”, and “LO- together. The results for all cases are around 70%.
CATEAT” indicates “LOCATION”. When creat- The F1-score is quite low.
ing one model for recognizing all five types of tag Table 6 shows precision, recall, and F1 -score of the
(ORG/LOC/FACILITY/USE/LOCATEAT), we ob- last column (LOC/ORG/FAC) in Table 5, measured
tain 95% accuracy for both training and testing. by sklearn-metric. The recall values are low.
Table 4 shows the accuracy results for five tags
including relations. Row “Trained/annotated” shows Table 5: Comparison between predicted and anno-
the number of training annotated data for each tag. tated NE using BERT with own vocabulary.
Row “Trained/predicted” is the number of predicted Type LOC/ORG FAC LOC/ORG/FAC
train test train test train test
tags. Similarly, “Tested/annotated” is the number Loss 0.019 0.0165 0.375 0.029 0.123 0.109
of tested annotated data entries. “Test/predicted” is Accuracy 0.751 0.717 0.859 0.7484 0.736 0.629
the number of predicted tags. The average number of F1 0.258 0.346 0.245 0.245 0.287 0.464
missing tags for both trained and test data is around
5%.
Table 6: Precision, Recall, F1 for BERT with own
Table 4: Annotated and Predicted Results for five vocabulary.
tags. Tag precision recall F1-score support
Data LOC USE LOCATEATFACILITY ORG B-ORG 1.00 0.08 0.15 50024
Train/ annotated39,17324,169 27,911 34,180 20,750 I-ORG 1.00 0.20 0.33 39288
Train/ predicted 36,34224,169 27,913 30,875 19,591 B-LOC 0.99 0.12 0.21 22723
Test/ annotated 16,94910,242 12,081 14,467 8,887 I-LOC 1.00 0.08 0.15 13782
Test/predicted 15,71610,245 12,084 13,053 8,459 B-FAC 1.00 0.27 0.43 37335
I-FAC 0.99 0.54 0.70 13672
O 0.53 1.00 0.69 163016
Figure 11 presents an example of results. Figure
11 (a) is the prediction, while Figure 11 (b) is the If we consider using the default BERT tokenizer,
actual annotation. If the phrases are correctly anno- ignoring the tokenization problem mentioned previ-
tated like “Naga Peak Boutigue Resort”, the predic- ously, we see in Table 7, the accuracy, precision, re-
tion will be correct. For “Nong Thale”, which is the call, and F1-score are higher.
name of location, the prediction includes the “city” Table 8 shows precision, recall, and F1-score.
word as well. “Ao Nang Krabi Boxing Stadium” is Compared to Table 6, the recall is higher in most
classified as ORG. SpaCy assumed the number 3.45 cases, and thus the F1-score is higher.
to be distance, so it is classified as LOC, but we do
not label this in our dataset. Table 7: Comparison between predicted and anno-
Figure 12 is another example where the manual an- tated NE using BERT (wordpiece).
notation is not complete. With the pretrained model Type LOC/ORG FAC LOC/ORG/FAC
of SpaCy, it can suggest the entity types such as LOC, train test train test train test
Loss 0.061 0.153 0.03 0.0295 0.123 0.109
and ORG. Accuracy 0.958 0.955 0.939 0.932 0.866 0.866
F1 0.737 0.715 0.421 0.431 0.442 0.464
4. 1.2 BERT
For BERT, we use the BIO tagging style. The Both tables are based on the BERT pre-training
training accuracy is depicted in Table 5, BERT per- model which is BERT-Large, Uncased (Whole Word
forms well on all of the tasks. Original BERT (re- Masking): 24-layer, 1024-hidden, 16-heads, 340M pa-
leased November 2018) relies on its tokenizer which rameters. In the default pre-processing code, Word-
Bimodal Emotion Recognition Using Deep Belief Network 117
(a)
(b)
(a)
(b)
Table 8: Precision, Recall, F1 for BERT with de- default BERT vocabularies.
fault tokenization (wordpiece).
Tag precision recall F1-score support 4. 1.3 BERT on other data sets
B-ORG 0.87 0.33 0.48 30821 Still, the F1-score is not quite satisfactory. This
I-ORG 0.79 0.50 0.61 37809
B-LOC 0.98 0.37 0.54 21445 is likely due to the vocabularies and keywords as
I-LOC 0.99 0.23 0.38 14321 well as the data set. Also, there is another prob-
B-FAC 0.86 0.54 0.67 49396 lem, which is that the data is not cleaned enough.
I-FAC 0.99 0.33 0.49 67105
O 0.85 0.99 0.91 734663 We retry the experiments on other similar kinds of
data sets with clear keywords. In the second data
set, as mentioned in Section 3.1, we extracted only
the review comments from TripAdvisor. There are
Piece tokens are randomly selected to mask. The three categories of reviews: hotel review, shop and
new technique is called Whole Word Masking. In tourism review, and restaurant review. We perform
this case, all of the the tokens corresponding to a the same preprocessing as in the previous experiment
word are masked at once. The overall masking rate in Section 4.1.2. The keywords are built based on
remains the same. The training is identical – we can the review types. Each data set has the character-
still predict each masked WordPiece token indepen- istics in Table 9. Under Column “size”, “original”
dently. In Table 8, the improvement comes from the is the total count of data lines scraped. “cleaning”
higher recall, while the precision is lower due to the shows the number of data elements which remained
118 ECTI TRANSACTIONS ON COMPUTER AND INFORMATION TECHNOLOGY VOL.15, NO.1 April 2021
after cleansing. The cleansing and preprocessing pro- Table 11: Precision, Recall, and F1-Score for
cesses do things such as removing irrelevant columns, BERT Hotel.
creating a list of keywords based on types (location, Tag precision recall F1-score support
B-RLOC 0.98 0.97 0.97 829
service, etc.), maintaining the data rows containing I-RLOC 0.99 0.99 0.99 792
the keywords, and finding the category type (label) B-RSER 0.96 0.93 0.95 133
of each row. The keywords are simpler than in the I-RSER 1.00 1.00 1.00 15
previous cases since they are only indicators in the O 1.00 1.00 1.00 131991
reviews that imply the locations or services. They
are not complicated proper nouns like in the previous Table 12: Precision, Recall, F1-score for BERT
data set. For example, location keywords are beach Shopping & Tourism.
side, street food, jomtien beach, kantary beach, next Tag precision recall F1-score support
B-NTV 0.98 0.99 0.99 867
to, etc. The service keywords are cleaning, bathroom, I-NTV 0.97 0.98 0.98 539
swimming pool, shower, amentities, etc. Finally, we B-STV 0.82 0.66 0.73 35
save the selected data rows containing keywords in a I-STV 0.71 0.48 0.57 25
new file. The columns “train” and “test” show the O 1.00 1.00 1.00 112454
data size divided into training and testing sets. Col-
umn “category and size” shows the category for each Table 13: Precision, Recall, F1-score for BERT
type of data and size. The data is used for other tasks Restaurant.
Tag precision recall F1-score support
such as the text classification task.
B-FOOD 0.99 0.98 0.99 1888
Table 10 presents the F1-score, validation loss, and I-FOOD 0.97 0.99 0.98 449
validation accuracy of BERT-NER task on each row B-RLOC 0.99 0.99 0.99 469
of Table 9. The table shows higher values for all data I-RLOC 0.97 0.98 0.97 122
O 1.00 1.00 1.00 168336
sets. For Shopping & Tourism and Restaurant data
sets, it yields the highest accuracy. This confirms the
effectives of the approach when a cleaner data set is ration steps and use pre-trained network models as a
used. tool.
Table 10: BERT-NER tasks for hotels, shopping, Table 14: Comparison between predicted and anno-
and restaurant data sets. tated NE using BERT.
Data set F1-score Val loss Val accuracy Approach type model data set tourism tag
Hotels 0.7796 0.0145 0.9933 Vijay Supervised CRF TripAdvisor, hotel name,
Shopping & Tourism 0.9888 0.0037 0.9997 et.al.[33] Wikipedia POI, review,
Restaurant 0.9839 0.0052 0.9986 star,...
Saputro Semi- Naı̈ve Top google nature, re-
et.al.[34] Supervised Bayes,SSL search gion, place,
Tables 11-13 presents the detail score of each cat- city, negative
egory. In Table 11 RLOC and RSER show the BIO Our work Supervised SpaCy,BERTTripAdvisor, Location,
Traveloka, Facility, Or-
tag for location and service tags in Table 9. In Ta- Hotels.com ganization
ble 12, NTV and STV are the BIO tags for natural-
travel and shopping travel respectively and in Table
13, FOOD and RLOC are the BIO tags for food and
location tags respectively. For all of the tags, we get 4. 2 Text Classification
high score for all precision, recall and F1-score. This We use the review data set in three categories. For
shows the effectiveness of the BERT-NER in the data each category, in labeling, we use a category number,
set. representing the category of each review. For exam-
ple, for restaurant data set, the label of 0 means the
4. 1.4 Comparison to other NERs review is about location, and the label of 1 means
Table 14 compares our approach with other ap- the review is about food. For the classification ex-
proaches. The model types are different, and different periment, we used numbered labels. For example,
tags are considered. Compared to our framework, we “50m to the beach means you’ll reach the sea within
focus on the methodology of automated data prepa- that distance” is in “Location Reviews”, which is the
Bimodal Emotion Recognition Using Deep Belief Network 119
Fig.14: Accuracy, precision, recall, and F1-score of Shopping & Tour dataset.
[7] S. Mouhim, A. Aoufi et al., “A knowledge man- in tourism, M. Sigala and M. L. Mich, Eds.
agement approach based on ontologies: the case of Springer, Vienna, 2007, pp. 21-32.
tourism,” SpringerPlus, vol. 4, no. 3, pp.362-369, [10] STI INNBUCK. (2013) Accommodation On-
2011. tology Language Reference. Retrieved 8 March
[8] D. Bachlechner. (2004) OnTour The Seman- 2019. [Online]. Available: http://ontologies.
tic Web and its Benefits to the Tourism sti-innsbruck.at/acco/ns.html
Industry. Retrieved 13 June 2019. [Online]. Avail- [11] Web vocabulary for e-commerce. (2008)
able: https://pdfs.semanticscholar.org/ A paradigm shift for e-commerce. Since
eabc/d4368f5b00e248477a67d710c058f46cd83e. 2008. Retrieved 8 March 2019. [Online].
pdf Available: http://www.heppnetz.de/
[9] M. Sigala, L. Mich et al., “Connecting destina- projects/goodrelations/
tions with an ontology-based e-tourism planner,” [12] M. Chaves, L. Freitas, and R. Vieira, “Hontol-
in Information and communication technologies ogy: A multilingual ontology for the accommoda-
Bimodal Emotion Recognition Using Deep Belief Network 121