Applsci 13 04636 v3

applied
sciences
Article
Zero-Shot Relation Triple Extraction with Prompts for
Low-Resource Languages
Ayiguli Halike 1,2 , Aishan Wumaier 1,2, * and Tuergen Yibulayin 1,2
1 School of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
2 Xinjiang Laboratory of Multi-Language Information Technology, Urumqi 830046, China
* Correspondence: hasan1479@xju.edu.cn; Tel.: +86-136-599-13514
Abstract: Although low-resource relation extraction is vital in knowledge construction and character-
ization, more research is needed on the generalization of unknown relation types. To fill the gap in
the study of low-resource (Uyghur) relation extraction methods, we created a zero-shot with a quick
relation extraction task setup. Each triplet extracted from an input phrase consists of the subject,
relation type, and object. This paper suggests generating structured texts by urging language models
to provide related instances. Our model consists of two modules: relation generator and relation and
triplet extractor. We use the Uyghur relation prompt in the relation generator stage to generate new
synthetic data. In the relation and triple extraction stage, we use the new data to extract the relation
triplets in the sentence. We use multi-language model prompts and structured text techniques to offer
a structured relation prompt template. This method is the first research that extends relation triplet
extraction to a zero-shot setting for Uyghur datasets. Experimental results show that our method
achieves a maximum weighted average F1 score of 47.39%.
Keywords: prompt; zero-shot; relationship triple extraction; Uyghur
1. Introduction
Relation and triplet extraction (RE) is a critical topic in information extraction (IE),
Citation: Halike, A.; Wumaier, A.; and it may be used for a variety of natural language processing (NLP) tasks [1–3]. The vast
Yibulayin, T. Zero-Shot Relation databases of labeled samples required by conventional approaches, on the other hand, are
Triple Extraction with Prompts for frequently expensive to annotate [4]. However, preceding zero-shot relation work does
Low-Resource Languages. Appl. Sci. not necessitate the extraction of the whole relation triplet. Chen and Li [3] present zero-
2023, 13, 4636. https://doi.org/ shot BERT, a unique multi-task learning approach, to directly estimate unknown relations
10.3390/app13074636 without hand-crafted feature tagging. There are numerous methods in use currently to
Academic Editors: Chilukuri address the problems caused by data scarcity. Distantly supervised [4,5] methods are fast
K. Mohan and Habib Hamam enough to construct large-scale corpora and corpus annotation is inexpensive, but the
quality of annotation produced by this method is not as good as the results of manual
Received: 1 February 2023 human annotation. The work objective could also be written so that the label space
Revised: 19 March 2023
is unrestricted.
Accepted: 4 April 2023
However, since relation triplet extraction is a structured prediction problem rather than
Published: 6 April 2023
a sequence classification task, the existing formulations cannot be used in this manner [6].
John Koutsikakis et al. [7] present GREEK-BERT, a BERT-based language model for con-
temporary Greek that is monolingual. Xin Xu et al. [8] present an empirical investigation
Copyright: © 2023 by the authors.
into the development of relational data extraction systems in low-resource environments.
Licensee MDPI, Basel, Switzerland. Based on recent pre-trained language models, they comprehensively investigate two meth-
This article is an open access article ods for evaluating efficacy in low-resource settings: data augmentation technologies and
distributed under the terms and self-training to generate more labeled in-domain data. Pundlik et al. [8] worked on a
conditions of the Creative Commons multi-domain Hindi dataset. The architecture described in the article [8] consisted of two
Attribution (CC BY) license (https:// stages. Yadav et al. [9] showed that Sentiment Analysis for the mixed Hindi language could
creativecommons.org/licenses/by/ be performed using three approaches. The first way was to use a neural network to classify
4.0/). previously known words.
Appl. Sci. 2023, 13, 4636. https://doi.org/10.3390/app13074636 https://www.mdpi.com/journal/applsci

Appl.
Appl. Sci.
Appl.
Sci. 2023,
Sci.
2023, 13,13,
2023, x FOR
x 13,
FOR PEER
x FOR
PEER REVIEW
PEER REVIEW
REVIEW 2 2ofof21717
of 17
Analysis
Analysis
Analysis forforthethemixed
for mixed
the mixed Hindi
Hindi Hindi language
languagelanguage could
could couldbebeperformed
performed
be performed using
using using three
three three approaches.
approaches.
approaches. TheThe The
Appl. Sci. 2023, 13, 4636 2 of 16
first
first way
first
way way
waswas totouse
was useause
to aneural
neural
a neural network
network network totoclassify
classify
to classify previously
previously
previously known
known known words.
words. words.
Wikipedia
Wikipedia
Wikipedia [10][10] [10] constructs
constructsconstructs this
this dataset
this
datasetdataset using
using using a asnapshot
snapshot
a snapshot ofofEnglish
English
of English Wikipedia.
Wikipedia.
Wikipedia. TheThe The
titles
titles ofofWikipedia
titles Wikipedia
of Wikipedia articles
articles articlesareareconsidered
areconsidered
considered brief
brief brief texts,
texts, texts, whereas
whereas whereas theabstracts
the theabstracts
abstracts ofofall allWik‐
of Wik‐
all Wik‐
ipedia Wikipedia
articles areused used [10] constructs this datasetlengthy using adocuments.
snapshot ofThis
English Wikipedia. The
ipedia ipedia articles
articles are are totobuild
used build
to build
thetheassociated
theassociated
associated lengthy lengthy documents.
documents. This This dataset
dataset dataset includes
includesincludes
titles of Wikipedia articles are considered brief texts, whereas the abstracts of all Wikipedia
a atotal
total ofof30,000
a total 30,000
of 30,000 short
short short texts
texts texts
andand and 1,747,988
1,747,988
1,747,988 long
long long documents.
documents.
documents. This
This This dataset
dataset dataset isiscompiled
compiled
is compiled bybyby
DBLP articles
[10] are used
using thethe to build the associated
bibliography database. lengthy
The titles documents. This dataset includes a
DBLP DBLP [10] [10]
using using the bibliography
bibliography database.
database. The The
titles ofofcomputer
titles computer
of computer science
science science literature
literature
literature
representtotal brief
of 30,000 texts, short
whereas textsthe and 1,747,988
abstracts long documents. This dataset is compiled by
represent
represent briefbrief
texts, texts,
whereas whereas the
the abstracts abstracts ofofallof
all published
all
publishedpublished papers
papers papers are arecollected
collected
are collected totoformform
to form
DBLP [10] using the bibliography database. The titles of computer science literature
thecorresponding
the corresponding
the corresponding lengthy
lengthy lengthy documents.
documents.
documents. This
This This dataset
dataset dataset consists
consistsconsists statistically
statistically
statistically ofof81,479
81,479
of 81,479 short
short short
represent brief texts, whereas the abstracts of all published papers are collected to form the
readers
readersreaders
andand and480,558
480,558 480,558 longlong long records.
records. records. Programmable
Programmable
Programmable webweb web [11]
[11] [11]generates
generates
generates a areal‐world
real‐world
a real‐world dataset
dataset dataset
corresponding lengthy documents. This dataset consists statistically of 81,479 short readers
forservice
for service
for service recommendation
recommendation
recommendation andand and description
description
description augmentation.
augmentation.
augmentation. FeiFeiCheng
Cheng
Fei Cheng etetal.al.
et[12][12]
al. [12]provide
provide provide
and 480,558 long records. Programmable web [11] generates a real-world dataset for service
ananopen‐source
open‐source
an open‐source natural
natural natural language
languagelanguage processing
processing
processing toolbox
toolbox toolbox for forextracting
extracting
for extracting Japanese
Japanese Japanese medical
medical medical da‐da‐da‐
recommendation and description augmentation. Fei Cheng et al. [12] provide an open-
ta.
ta. TheyThey
ta. They create
create create a pipeline
a pipeline
a pipeline system
system system consisting
consisting
consisting of
of three three
of three components
components
components for identifying
for identifying
for identifying medical
medical medical
source natural language processing toolbox for extracting Japanese medical data. They
entities,
entities,
entities, categorizing
categorizing
categorizing entity
entity entity modalities,
modalities,
modalities, andand and extracting
extracting
extracting relations.
relations.
relations. Existing
ExistingExisting relation
relationrelation extrac‐
extrac‐ extrac‐
create a pipeline system consisting of three components for identifying medical entities,
tion tiondatasets forthe thethe Russian language arefew few and comprise alimited
limited number ofin‐ in‐in‐
tion categorizing entity modalities, and extracting relations. Existing relation extraction of
datasets
datasets for for Russian Russian languagelanguage are are few
and and
comprisecomprise a a limited number number of datasets
stances.
stances.
stances. Hence,
Hence,Hence, D. D.I.D. I.Gordeev
Gordeev
I. Gordeev et etal. al.[13]
et [13]
al. [13] chose
chose chose to todevelop
develop
to develop RURED,
RURED, RURED, a anew new
a new Ontono‐
Ontono‐ Ontono‐
for the Russian language are few and comprise a limited number of instances. Hence, D.I.
tes‐based
tes‐based sentence‐level
sentence‐level namednamed entity
entityand and relation
relation extractiondataset. dataset. The collection in‐in‐
tes‐based sentence‐level
Gordeev et al. [13] named chose toentity
develop and relation
RURED, newextraction
a extraction Ontonotes-based dataset. The The collection
collection
sentence-level in‐ named
cludes
cludes cludes over
overover
entity 500500 500annotated
andannotated annotated
relation
texts
textstexts
extraction
and and
anddataset.over
overover 5000
5000 The5000 labeled
labeled labeled
collection
relationships.
relationships.
relationships.
includes
Leonhard
Leonhard
over Leonhard
500
Hennig
Hennig
annotated Hennig etet et
texts and
al.al.[14]
[14]
al. present
[14]
present
over present
5000 aGerman‐language
alabeledGerman‐language
a German‐language
relationships. dataset,
dataset,dataset,
Leonhard which
which which
Hennig isishuman‐annotated
human‐annotated
isethuman‐annotated
al. [14] presentwith awith 2020coarse
with coarse
20
German-language coarse
and
and andfine‐grained
fine‐grained
fine‐grained
dataset, which entity
entity entity
is types
types types
and
human-annotated and and entity‐linking
entity‐linking
entity‐linking
with 20 informationinformation
coarse information forforgeographically
and fine-grained forgeographically
geographically
entity types linkable
linkablelinkable
and entity-
entities.
entities.
entities. This
This first
first German‐language
German‐language dataset
datasetdataset combines
linking information for geographically linkable entities. This first German-languageEntity
This first German‐language combines combines annotations
annotations
annotations forfor forNamed
Named Named Entity
Entity dataset
Recognition
Recognition
Recognitioncombines andand and Relation
Relation Relation
annotations Extraction.
Extraction.
Extraction.
for Named Ahmed
Ahmed Ahmed
Entity Hamdi
Hamdi Hamdi etetal.al.
Recognition et[15][15]
al. present
[15]
present
and present
Relation thethecreation
thecreation
creation
Extraction. ofofthe the
of the
Ahmed
NewsEye
NewsEyeNewsEye
Hamdi resource,
resource,
resource,
et al. [15] a multilingual
a multilingual
a multilingual
present dataset
dataset
the creation dataset forfor
of the named
forNewsEye
named named entity
entity entity identification
identification
identification
resource, and
and and
a multilingual linking,
linking,linking,for
dataset
enhanced
enhanced
enhanced named with
with with attitudes
attitudes
attitudes
entity towards
towards
identification towards named
named
and named entities.
entities.
linking, entities.
enhanced TheThe The collection
collection
collection
with consists
consists
attitudes consists
towardsofofdiachronic
diachronic
of diachronic
named entities.
historical
historical
historical newspaper
newspaper
newspaper articles
articles articles published
published
published between
between between
The collection consists of diachronic historical newspaper articles published between 1850185018501850and and and 1950
1950 1950
in in inFrench,
French, French, German,
German, German,
Finnish,
Finnish,
Finnish, andandand and Swedish.
inSwedish.
Swedish.
1950 French, German, Finnish, and Swedish.
DueDue toto
Due thethe
to
Due to lack
the
lack thelack oflow‐resource
oflack low‐resource
of oflow‐resource
low-resource datasets,
datasets,
datasets,
datasets, research
research research
research ononrelation
onrelation
relation
relationand and and
and triplet
triplet triplet
triplet extraction
extractionhas
extraction
extraction
has
has has mostly
mostly mostly
mostly been
been
beenbeen restricted
restricted
restricted
restricted to to
to the the
the English,
to English,
the English, German,
German,
German, German, French,
French,
French, French, Russian,
Russian,
Russian,
Russian, Japanese,
Japanese,
Japanese, Japanese, Finnish,
Finnish,
Finnish, Finnish,
Swedish,
Swedish,
Swedish,
Swedish, Arabic,
Arabic,
Arabic, Arabic,
andand and
Hindi and Hindi
Hindi Hindi
languages languages
languages
languages In[11].
[11]. [11]. this InIn
[11]. thisthis
In
study, this study,
study,
we study, we
present wepresent
wepresent
anpresent ananalternate
alternate alternate
an alternate
relation rela‐
rela‐ rela‐
extraction
tion
tion tionextraction
extraction
extraction
method method
method
for amethod forforafor
low-resource alow‐resource
low‐resource
a low‐resource
language language language
language
(Uyghur) (Uyghur)
(Uyghur)
that (Uyghur) that
that
can extract that
cancan canextract
extract
facts ofextract
new facts
facts ofof of
facts
kinds that
new
new new kinds
kinds
werekinds that
that
not that
werewere
previouslywerenot notnot previously
previously
previously
stated stated
stated
or observed. statedoror observed.
or
observed.
Despite observed.
theDespiteDespite
lack Despite thethelack
of annotated the lack lack ofannotated
annotated
of
oftraining annotated
examples,
training
training
training examples,
examples,
examples,
including including
theincluding
test including
relation thethe thetesttest
test
labels, relation
relation
relation
our labels,
labels,
method labels,
our
aims ourtoour method
method method
extract aimsaims
triplets totoextract
aims ofextract
to extract
the triplets
triplets
subject, triplets
object,
of
of the the subject,
of subject,
the subject,
and object,
object,
relation object,
type and and and
from relation
relationrelation
each type
typetype
sentence. from
fromfrom each
eacheach
Tables sentence.
sentence.
1 and sentence. Tables
Tables
2 provide Tables 1
1a and and
task 1 and 2 provide
2example2 provide
provide a
a task
and task
a task
overview
example
exampleexample
of and and
task and overview
overview overview
settings ofoftask
for task
of
easy tasksettings
settingssettings
comparison. forforeasyeasy
for easy comparison.
comparison.
comparison.
This is the firstThis This isisthe
This
research the first
is first
the
that, firstresearch
research
to research
our that,
that,
knowledge,that,
totoour our
to knowledge,
extends
our relation
knowledge,
knowledge, extends
triplet
extends extends relation
extraction
relationrelation triplet
totriplet
triplet extraction
a zero-shot extraction
extraction totoa to
setting azero‐shot
zero‐shot
for Uyghur
a zero‐shot setting
datasets.
setting settingforforAs Uyghur
for
UyghuraUyghur
result, we
datasets.suggest
datasets.
datasets. As AsaAs aRelation
result,
a result,
result, we we
Prompt,
wesuggest
suggest which
suggest Relation
redefines
Relation
Relation Prompt,
Prompt, the which
Prompt, which
zero-shot
which redefines
task as the
redefines
redefines thethezero‐shot
zero‐shot
production
the zero‐shot task
task asas as
of task
new data
the production
the for Uyghur
production of new
languages.
of new data data for
The Uyghur
for fundamental
Uyghur languages.
the production of new data for Uyghur languages. The fundamental idea is to use rela‐ idea
languages. The
is to
The fundamental
use relational
fundamental idea
prompts
idea is to
is use
to
to rela‐
motivate
use rela‐ a
tional
tional language
tional prompts
promptsprompts model
to to
motivate
to motivate
to motivate produce a an
language artificial
a language
a language model
model training
model to sample
produce
to produce
to produce capable
an of
artificial
an artificial
an artificial expressing
training
trainingtraining the
sample
sample necessary
sample
capable
capable relations.
capableofofexpressing
of Thesethe
expressing
expressing synthetic
thenecessary
thenecessary datarelations.
necessary can then These
relations.
relations. be used
TheseThese tosynthetic
train data
synthetic
synthetic another
data data
cancan process
canthen
then bebe
then of extracting
used
be
used toto to a
used
train
train model
trainanother
another
another of the
process
process triplet
process [16].
ofofextracting
extracting
of extracting a amodel
model
a model ofofthe thetriplet
of triplet
the triplet [16].
[16]. [16].
Table1.Table
Table
Table 1. 1.
Task Task
1. Task
Task examples
examples
examples ofof
examples of extraction.
triplet
triplet extraction.
extraction.
of triplet
triplet extraction.
Sentence
Sentence
Sentence
Sentence Subject
Subject
Subject
Subject Object
Object
Object
Object
Mr. Mr.
Mr.Mr.
Alim
Alim Alim
Alim
died died
died
died
in in Urumqi.
in Urumqi.
inUrumqi.
Urumqi. Mr.
Mr. Alim
Mr.Mr.
Alim
AlimAlim In
InIn Urumqi
Urumqi
In Urumqi
Urumqi
In In
InInTable
Table1,Table
Table 1,1,Sentence,
Sentence,
1,Sentence,
Sentence, Subject,
Subject,
Subject,
Subject, and and
and Object
Object
and refer
Object
Object refer
refer to
referthe
toto sentence,
the
to
the the firstfirst
sentence,
sentence,
sentence, entity,
first and second
entity,
first entity,
andand
entity,
entity, and second
second respectively.
second entity,
re‐re‐re‐
entity,
entity,
spectively.
spectively.
spectively.
Table 2. Task parameters against our proposed Zero-Shot prompts retreated to triplet extraction.
Table
Table 2.2.
Table Task
2. Task
Task parameters
parameters
parameters against
against our
against
our proposed
our proposed
proposed Zero‐Shot
Zero‐Shot
Zero‐Shot prompts
prompts
prompts retreatedtoto
retreated
retreated triplet extraction.
to triplet
triplet extraction.
extraction.
Task Setting Input Output Supervision
Task
Task Setting
Task Setting
Setting Input
Input
Input Output Supervision
Output Supervision
Output Supervision
Relation Classification Sentence, entityhead , entitytail label Full
Zero-Shot Relation Relation
Relation Classification
Relation Classification
Classification SeSntence
entence
Se , , entity
ntence ,
entity entity,
head
head
, entity
entity
head , entity
tailtail tail label
labellabel Full
Full Full
Sentence, entityhead , entitytail label Zero-Shot
Classification
Relation Triple Extraction Sentence entityhead , entitytail , label Full
Zero-Shot-Prompts Relation
Sentence entityhead , entitytail , label Zero-Shot
Triple Extraction
In Table 2, Sentence, entityhead , entitytail refer to the sentence, first entity, and second entity, respectively.
Appl. Sci. 2023, 13, 4636 3 of 16
Overall, our significant contributions are as follows:

1. In English, words and prepositions are separated by spaces, but in Uyghur, the entity
and the suffixes following the entity are linked together. In Uyghur triplet extraction,
entities and suffixes are not handled like Uyghur root extraction because entities may
contain more than one word, so we have to consider the suffix of the last word in
the entity.
2. The Zero-Shot relation triple extraction dataset, known as Zero-Shot-Prompt-RE, is
made up of Uyghur. In our collection, we have over 23,652 examples. There are 24
related classes that can each be an instance.
3. This is the first research that, to our knowledge, extends relation triplet extraction to a
zero-shot setting for Uyghur datasets.
4. We use prompt-build synthetic relations to provide structured texts in order to make
Zero-Shot-Prompt-Relation-Extraction.
2. Related Work
RE is a crucial pre-requisite job for creating large-scale knowledge graphs (KG), which
can be used for automated responses, information retrieval, and other natural language
processing (NLP) activities [17–21]. Previous papers on RE had a pipelined approach: first,
all entities in the input phrase are identified using named entity recognition (NER), then,
relational classification (RC) is applied to pairs of extracted entities [13]. Pipelined ap-
proaches typically have an issue with error propagation and ignore the connection between
the two phases [20]. Ref. [21] provides a reading comprehension method in which one
or more natural-language questions are associated with each relation slot. To improve
the performance of relation classification systems, Ref. [22] performs zero-shot relation
classification by using relation descriptions, current textual entailment models, and freely
available textual entailment datasets. Ref. [23] proposes a unique dataset for Reading
Comprehension (RC) for neural techniques in language understanding. Using GPT-3 as a
case study, Ref. [24] shows that zero-shot prompts can significantly outperform those that
use few-shot examples. This analysis motivates the rethinking of the role of prompts in
controlling and evaluating powerful language models. Refs. [25–27] explore techniques for
exploiting the capacity of narratives and cultural anchors to encode nuanced intentions.
In natural language processing, prompting-based methods have shown promise in novel
paradigms for zero-shot or few-shot inference [28]. Another advantage is adapting exten-
sive language models to new jobs without needing somewhat expensive fine-tuning [29].
In order to combine many pathways and learn to share strength in a single RNN expressing
logical composition across all relations, Refs. [30–32] present joint learning and neural atten-
tion modeling. However, such methods have many disadvantages for more difficult tasks
such as triplet extraction [33,34]. Refs. [35,36] construct a Uyghur grammatical dictionary
and provide extensive Uyghur grammatical information. Ref. [37] proposed a hybrid joint
extraction model for supervised Uyghur datasets. To extend Uyghur RE research, Ref. [38]
proposed a hybrid method to expand the Uyghur-named entity relation corpus. Sonali
Rajesh Shah et al. [39] analyze, review, and discuss the methods, algorithms, and challenges
researchers face when performing indigenous language recognition. The primary objective
of this assessment is to comprehend the recent work undertaken in South Africa for indige-
nous languages. The trends in the field of SA are being uncovered by analyzing 23 articles.
Machine learning, deep learning, and sophisticated deep learning algorithms were utilized
in 67% of the reviewed publications. Twenty-nine percent of researchers have employed a
lexicon-based methodology. SVM (Support Vector Machine) and LR (Logical Regression)
outperformed all other machine learning approaches. CHENHE DONG et al. [39] provide a
historical overview of deep learning research on natural language generation, highlighting
the unique nature of the problems to be addressed. It begins with a contrasting generation
with language comprehension, establishing fundamental concepts regarding the objectives,
datasets, and deep learning methods. Following is a section of evaluation metrics derived
from the output of generation systems, illustrating the possible performance types and the
Appl. Sci. 2023, 13, 4636 4 of 16
areas of difficulty. Finally, a list of open problems presents natural language generation
research’s main challenges and future orientations. Nasrin Taghizadeh [40] proposed a
cross-language method for relation extraction, which uses the training data of other lan-
guages and trains a model for relation extraction from Arabic text. The task is supervised
learning, in which several lexical and syntactic features are considered. The proposed
method mainly relies on the Universal Dependency (UD) parsing and the similarity of
UD trees in different languages. Regarding UD parse trees, all the features for training
classifiers are extracted and represented in a universal space. Yuna Hur et al. [41] suggest
Entity-Perceived Context representation in Korean to provide a better capacity for grasping
the meaning of entities and considering linguistic qualities in Korean.
3. Zero-Shot-Prompt-RE: Methodology
The majority of zero-shot research has been conducted in English. We provide a
Uyghur-Relation-Prompt that employs relation types as prompts to produce Uyghur syn-
thetic relation instances of target unknown relation types in order to extract triplets for
unknown relation types in Uyghur-Zero-Shot-Prompt-RE. There are more research works
on relation extraction using English. However, there needs to be more research on Uyghur,
which belongs to a different language family to English and whose grammar is more
complex with weaker letters and more or missing affixes. Therefore, this paper implements
a rule- and template-based approach to Uyghur relation extraction research.
3.1. Task Definition

In English, words and prepositions are separated by spaces, but in Uyghur, the entity and
the suffixes following the entity are linked together. In Uyghur triad extraction, entities and
suffixes are not handled like Uyghur root extraction because entities may contain more than
one word, so we have to consider the suffix of the last word in the entity. Uyghur-Zero-Shot-
Prompt-RE aims to use relational and entity cues to learn and produce synthetic data from seen
datasets, Ds, and to extract triplets from unseen test datasets, Du. Ds and Du are described

as D = (S, T, R), where S is the Uyghur sentence: “ðYKñ¯ñK @YJJQK.ñ KñK ½Jj.J.K. èQÖÏAK.—Almira is a
student at Peking University” the seen relation is “educated at (P69)”—“àéÃPñ JKñK
úæK éJ»éÓ úæËAK
(P69)”, T = ( àéÃP ñ JKñK
úæK éJ»éÓ úæËAK , èQÖÏ AK (P69) ,úæJQK
JK) is the output triplets,
. ñ KñK ½Jj. ..

and R is the relation. “ è ÖÏ AK—Alimire ” is a person’s name, “ @YJJQK. ñ KñK ½Jj.J.K.—at Peking
Q
University” is a geographical named entity, a relation triple is defined as (ehead , etail , y),
JK”, and relation
. ñ KñK ½Jj
which denotes the head entity—“ èQÖÏ AK”, tail entity—“ @X + úæJQK . ..
type –“educated at (P69)”—“ àéÃP ñ JKñK
úæK éJ»éÓ úæËAK (P69)”, respectively. In the tail entity,
“ @YJJQK. ñ KñK
JK—at Peking University”, the entity to be extracted should be Peking
½Jj
. ..
University, without the preposition. The seen and unseen relation type sets are represented
as Rs = rs1 + rs2 . . . + rsn and Ru = ru1 + ru2 . . . + rum , where n = | Rs | and m = | Ru | are the
sizes of the known and unknown relation type sets, respectively.
First entities, second entities, and relation labels are shown in blue, orange, and dark
red, respectively, in Figure 1a,b. The relation generator (a) accepts the relation label as
input and returns the context and entity combination as output. The relation extractor (b)
accepts a phrase as input and returns a relation triplet consisting of an entity couple and a
relation label.
Usually, we define the knowledge graph formally as G = {ε, R, F }, where ε, R, F
denote the set of entities, relations, and facts, respectively. A fact is denoted as a triplet, T.
The notation of the specific triadic relation extraction design and its description are shown
in Table 3.
First entities, second entities, and relation labels are shown in blue, orange, and
dark red, respectively, in Figure 1a,b. The relation generator (a) accepts the relation label
Appl. Sci. 2023, 13, 4636
as input and returns the context and entity combination as output. The relation extractor
5 of 16
(b) accepts a phrase as input and returns a relation triplet consisting of an entity couple
and a relation label.
Input Relation:
Place of death(P20)
Context:
Mr. Alim died in Urumqi.
Output First Entity: , Second Entity: .
Alim Urumqi
(a) Structured Uyghur template for relation generator
Input Context: Mr. Alim died in Urumqi.
First Entity: < >, Second Entity: < >

Output Alim Urumqi
Relation: Place of death(P20)
(b) Structured Uyghur template for relation detector
Figure1.1.The
Figure TheUyghur
UyghurRelation
RelationPrompt
Promptstructured
structuredtemplates.
templates.
3. Common
Table Usually, wesymbolic descriptions
define the knowledgein relationship extraction.
graph formally as G = {ε，R, F } , where ε，R, F
denote
Notationthe set of entities, relations, and facts, respectively. A fact is denoted as a triplet,
Description
T.G The notation
= {ε, R, F } of the specific triadic
A knowledgerelation
graph extraction design and its description are
shown
F in Table 3. Facts
D = (S, T, R) Sentence, Triplet, Relation
Table
Tu = 3. Common
(ehead , etail , ysymbolic
) descriptions
A Uyghur intriple
relationship
of head,extraction.
relation, and tail
Ds Seen datasets
Notation
Du Description
Unseen datasets
G = {ε，R, F }
U g The Uyghur relation generator
A knowledge graph
Ue The relation detector
Y TheFacts
set of seen relational types
Fs
Yu The set of unseen relational types
D = ( S,T , R)
Dnew_u TheSentence, Triplet,
labeled unseen Relation
dataset
Dnew_s The labeled seen dataset
u = ( ehead 2 , ,. .y.)wn ]
TW = [w1 ,, w
etail TheAconditional
Uyghur triple of head,
generation relation,
probability forand
eachtail
sequence
Rs = rs1 + rs2 . . . + rsn The seen relation type sets
DRsu = ru1 + ru2 . . . + rum TheSeen
unseendatasets
relation type sets
m = | Ru | The sizes of the unseen relation type sets
Dn u= | Rs | TheUnseen datasets
sizes of the seen relation type sets
Ug The Uyghur relation generator

3.2. Relation Generator
U e By conditioning the language The relation
model detector
on the target’s unknown relation types, we use
relation prompts to produce synthetic sentences. Uyghur is a minority language with fewer
Ys The set of seen relational typescost can be meager: if the
data resources [42,43]. While using fine-tuning, the training
method of deriving feature vectors is used for transfer learning, the later training cost is
light, the CPU is entirely stress-free, and no deep learning machine can perform it [36–39].
As regards what is suitable for small data sets, for cases where the dataset itself is small
(thousands of sentences), it is not realistic to train an extensive neural network with tens of
millions of parameters from scratch because the more significant the model, the greater the
ByByconditioning
conditioningthe thelanguage
languagemodel modelon onthe thetarget’s target’sunknown unknownrelation relationtypes, types,we weuse use
relation
relation prompts to to produce synthetic sentences. Uyghur is ais isisminority language with
relationprompts
relation
relation promptsto
prompts
prompts totoproduce
producesynthetic
produce
produce syntheticsentences.
synthetic
synthetic sentences.Uyghur
sentences.
sentences. Uyghuris
Uyghur
Uyghur aaaaminority
minoritylanguage
minority
minority languagewith
language
language with
with
with
Yu fewer
fewer data resources [42,43]. WhileThe setusing
using of fine‐tuning,
unseen relational thethe types
training cost cancan be be meager: if
fewerdata
fewer
fewer dataresources
data
data resources[42,43].
resources
resources [42,43].While
[42,43].
[42,43]. Whileusing
While
While usingfine‐tuning,
using fine‐tuning,the
fine‐tuning,
fine‐tuning, thetraining
the trainingcost
training
training costcan
cost
cost canbe
can bebemeager:
meager:ifififif
meager:
meager:
thethe methodmethod of deriving feature vectors is used for transfer learning, the later training
Dnew the
the
the
_u
methodof
method
method ofofderiving
of derivingfeature
deriving
deriving feature
feature
featureThe
vectors
vectorsis
vectors
vectors
labeled
isis isused
unseen usedfor
used
used for fortransfer
for
dataset transferlearning,
transfer
transfer learning,the
learning,
learning, thethelater
the latertraining
later
later training
training
training
costcost is light,
is light, the theCPU CPU is entirely
is entirely stress‐free,
stress‐free, and and no nodeep deep learning learning machine
machine can can perform
perform it
cost
cost is
costisislight,light,
light,the the CPU
theCPU is entirely
CPUisisentirely stress‐free,
entirelystress‐free, stress‐free,and andandno no nodeep deep
deeplearning learning
learningmachine machine
machinecan can performitititit
perform
canperform
[36–39].
Dnew
[36–39]. AsAs regards what is suitable forfor small data sets, forfor cases where thethe dataset itself
Appl. Sci. 2023, 13, 4636 [36–39].
[36–39].
[36–39].
_s
As
As Asregards
regardswhat
regards
regards whatis
what
what isissuitable
is suitable
suitable
suitable
The labeled for forsmall
for smallseendata
small
small datadatasets,
data
dataset sets,for
sets,
sets, forcases
for caseswhere
cases
cases wherethe
where
where thedataset
the datasetitself
dataset
dataset itself
6 of 16
itself
itself
is small
isis small
small
isis small (thousands
(thousands
(thousands of sentences),
of
of sentences),
sentences), it isit
it not
isis not
not realistic realistic
realistic to train
to
to train
train an anextensive
an extensive
extensive neural neural
neural network
network
network
small (thousands
(thousands of of sentences),
sentences), itit isis not not realistic realistic to to traintrain an an extensive
extensive neural neural network network
The from conditional generation probability for eachthe se‐model,
with
W withwithtens
with
withw1 ,tens
w
tens
tensof of
2 ,....
tens
millions
ofw
of millions
ofnmillions
millions
millions of ofparameters
ofofparameters
of parametersfrom
parameters
parameters from
from scratch
fromscratch scratchbecause
scratch
scratch because
becausethe
because
because the themore
themore
the more
more significant
moresignificant
significantthe
significant
significant themodel,
the
the model,
model,
model,
thethethegreater
greater
greater thethe amount
the amount
amount of of data,
of data,quence
data, and and
and overfitting overfitting
overfitting cannot cannot
cannot be be avoided.
be avoided.
avoided. WeWe We proposepropose
propose thethe thefol‐fol‐
the
theamountgreater
greater the
of thedata,amount
amount of data,
of data, and
and overfitting and cannot overfitting
overfitting
be avoided. cannot
cannot We propose be avoided.
be avoided. We propose
We propose
the following algorithmthefol‐
the fol‐
fol‐to
lowing 
Rs lowing r
lowing
lowing
lowing
handle
s 
1 algorithm
r 2
....
algorithm
algorithm
s algorithm
algorithmr n to handle the Uyghur‐Zero‐Shot‐Prompt‐RE:
to
to
to handle
handle
tohandle
handlethe
thes Uyghur-Zero-Shot-Prompt-RE: the
the
The Uyghur‐Zero‐Shot‐Prompt‐RE:
Uyghur‐Zero‐Shot‐Prompt‐RE:
seen
theUyghur‐Zero‐Shot‐Prompt‐RE:
Uyghur‐Zero‐Shot‐Prompt‐RE: relation type sets
AsAs Asshown shown
shown in in Algorithm
in Algorithm
Algorithm 1, 1,
the
1,1,the
1, the U gU UUis is the
isgisthe the Uyghur Uyghur
Uyghur relationrelation
relation generator;
generator;
generator; thethethe U eU UisUe is
U the
iseisthe
the
AsAs As shown
shown
shown ininin Algorithm
Algorithm
Algorithm 1,thethe the UggU gg is is the
thethe Uyghur
Uyghur
Uyghur relation
relation
relation generator;
generator;
generator; the
thethe e eis
eU the
is thethe
Ru  ru1  ru2 ....  rum The unseen relation type sets
relation relation
relation
relation
relation detector. detector.
detector.
detector. As As
As shown
As
shown
shownshownin Figure
in
in in
Figure
Figure Figure 1a, 1a,
1a,the 1a, the
the relation
the relation
relation
relation takes takes
takes an
takes aninput:
an an input:
input:
input: Relation: Relation:
Relation:
Relation: ʺlabel,ʺ “label”,
ʺlabel,ʺ
ʺlabel,ʺ
relation detector. detector. As As shown
shown in in FigureFigure 1a, 1a, the the relation relation takes takes an an input:
input: Relation: Relation: ʺlabel,ʺ ʺlabel,ʺ
whichwhich
m which which
R is ais structured
aais astructured
structured
aastructured prompt. prompt.
prompt. The The outputs
The outputs outputs ofare are structured
are structuredstructured promptsprompts prompts in sets
the
in informthe of of
form context:
of context:
which
which u
is
isis structured
structured prompt.
prompt.
prompt. TheThe
The
The sizes outputs
outputs
outputs the are
are unseen
are structured
structured
structured relation prompts
prompts type
prompts ininthe
in the
the
the form
form
form
form of
ofofcontext:
context:
context:
context:
“Mr. Alim
“Mr. died
Alim in in
died Urumqi.—
inUrumqi.—
Urumqi.— ”, ”, subject: ”Alim—“Alim— ”, ob‐
“Mr.
“Mr.
n “Mr. Rs Alim
“Mr. Alim
Alim
Alim died
died
died
died in
inin Urumqi.—
Urumqi.—
Urumqi.— ”,”,subject:
”, subject:”Alim—
subject:
subject: ”Alim— ”,
”Alim—
”Alim— ”,”,object:
”, ob‐
ob‐
ob‐
ob‐
“Urumqi— ”. The The sizes
probability of ofthe the next seen word relation given type
all thesets previous ones can be used
ject:
ject: ”Urumqi—
ject:
ject: ”Urumqi—
”Urumqi— ”.
”. The
Theprobability
probability
probability of the
ofof the
the next next
next word word
word given given
given all the
all
all the
theprevious
previous
previous ones ones
ones can can
can
ject:”Urumqi—
”Urumqi— ”.”.The
Theprobability
probabilityof ofthe thenext nextword wordgiven givenall allthe theprevious
previousones onescan can
be be usedto construct
to to construct a statistical
a statistical language language model model [10]. The [10]. purpose
The purpose is to calculateis is to the conditional
calculate thethe
be
be usedusedto
used
used
begeneration totoconstruct
construct
construct
construct aaaastatistical
statistical
statistical
statistical language
language
language
language model
model
model
model [10].
[10].
[10].
[10]. The
The
The
The purpose
purpose
purpose
purpose isisto
is to
toto calculate
calculate
calculate
calculate the
the
the
3.2. Relation generation Generatorprobability for eachfor sequence, W = [wW 1, w 2, w . . ,. wwn,.... ]: w  :
conditional
conditional
conditional
conditional generation
generation
generation probability
conditional generation probability for each sequence, WW w1w111, ,ww
probability
probability
probability for
foreach
for each
each
each sequence, sequence,
sequence,
sequence, W
W  1ww ,
2, w
w ,....
,....
222 ,.... w
w
2 ,....w
n 
wnnnn :
:::
By conditioning the language model on n the
n target’s unknown relation types, we use
= p∏
nnn
p w ppp(w w)  w|i||wwwi <
 wi1
n
pp w w pi|w
(wUyghur
w ,iii,,)is, a minority language with (1)
relation prompts to produce synthetic w
sentences.   1=
iiii1
p
1
p p p w w ii i | w  i
i
| w  i , , (1)(1) (1)
(1)
(1)
fewer data resources [42,43]. While using fine‐tuning, i
11 the training cost can be meager: if
Considering
Considering thethe alphabetic
alphabetic weakness
weakness of the
of the Uyghur Uyghur language andand thethe multiple ad‐ad‐
the method Considering
Considering
Consideringof
Consideringthederiving the
the alphabetic
alphabetic
feature
thealphabetic vectors
alphabeticweakness weakness
weakness weaknessof is used of of the
ofthethe
for Uyghur
theUyghur Uyghurlanguage
Uyghur
transfer languageand
language
language
learning,
language andthe
and
the themultiple
the
later multiple
multiple
training
multiple ad‐
ad‐
ad‐
addi-
dition
dition
dition
costdition or loss
or
or
or loss
loss
lossof affixes,
of
of affixes,
affixes, we we
we use use
usea prompt‐based
aa prompt‐based
prompt‐based approach.approach.
approach. Neural Neural
Neural networks
networks
networks typically
typically
typically
istion
dition light,or the
orloss loss ofof
CPU of affixes,
affixes,
affixes, wewe
is entirely we use
usestress‐free,
use
a prompt-basedaa prompt‐based
prompt‐based and no deep
approach. approach.
learning
approach. Neural Neural
machine
Neural
networks networks
can
networks perform
typically typically
typicallyit
generate
generate
generate category
category probabilities
probabilities using
ausing a “softmax” aaaa“softmax” datalayer layer that
forthat converts thethe logit [44], OutOut Out
generate
[36–39].generate
category
generate As category
category
regardsprobabilities
category probabilities
whatprobabilities
isusing
suitable
probabilities using
using
usingfor
“softmax” small “softmax”
“softmax”
“softmax” layer sets,
that layer
layer
layer converts thatconverts
that
that
cases converts
converts
where
the
converts logit the
the
[44],
the logit
logit
logit
dataset
Out
logit [44],
[44],
[44], itself
i , calculated
[44], Out
iOut iii i
,iscalculated
,,,calculated
smallcalculated
for each for
(thousands each
for
for
category each
each category
of category
sentences),
category
into a intointo
into
probability, a
it probability,
a
is a probability,
not p,
probability, realistic
by comparing p ,top by
p ,, by comparing
train
by comparing
Out an
comparing extensive
with Out
other Out
Out
Out with
neural with
with
logarithms. other
other
network
other log‐
The log‐
log‐ T is
,calculated
calculatedfor foreach
eachcategorycategoryinto intoaaprobability, probability, pp,,by bycomparing
comparing
i iOut iii i with
withother otherlog‐ log‐
arithms.
with the
tens The
temperature,
of T
millionsis the temperature,
ofV is the
parameters vocabulary V VVis scratch
from the
size. vocabulary because size.
the more significant the model,
arithms.
arithms.The
arithms.
arithms. The TTTTis
The
The isisthe
is the
the temperature, VV is
thetemperature,
temperature,
temperature, isisthe
is the thevocabulary
the vocabularysize.
vocabulary
vocabulary size.
size.
size.
the greater the amount of data, and overfitting cannot be avoided. We propose the fol‐
expexp expexp
Out   
(Out
Out
Out / T ///T
Out /T
iTT  
)
lowing algorithm to handle the pUyghur‐Zero‐Shot‐Prompt‐RE:  xpppip|xxixxiixi |||xi|xxiixi iiiiV VV|V |
p ( x | x < i ) = exp
exp i Out ii ii / T , ,,,, , (2)
exp / T ///TjTT/T
(2)
ii ii
 Out  generator; the U e is the (2)
VV (2)
As shown in Algorithm 1, the U g is the (2)

Uyghur
exp ∑exp Out

relation

(2)
j 1 j  exp
expexp Out
Out Out
jOut jjj j / T
jjj=
1 1 1
relation detector. As shown in Figure 1a, the relation j 11 takes an input: Relation: ʺlabel,ʺ
which ByisBy By
Bydividing
a structured
dividing
dividing the the
theoutput
prompt.
output
output sequences
The outputs
sequences
sequences based are
based
based on
structured on
on the the
the particular
prompts
particular
particular phrases
inphrases
the
phrases form“Context:of context:
“Context:
“Context: Mr. Mr.
Mr.
By Bydividing
dividing
dividing the theoutput
the outputsequences
output sequences
sequences based
based based onon on the
the the particular
particular
particular phrases
phrases
phrases “Context:
“Context:
“Context: Mr.
Mr. Mr.
Alim died in Urumqi.— ”, first entity: Alim— ”, ”,
”,second
”, En‐
“Mr. Alim
Alim
Alim Alim
Alim died
died
died died
died
Alim died in Urumqi.— in
in
in Urumqi.—
Urumqi.—
inin Urumqi.—
Urumqi.—
Urumqi.— ”,
”,
”, first
first
first entity:
”, first entity: Alim—entity:
”,
entity: Alim—
Alim—
subject:
“Alim—
Alim— ”Alim— ”, second En‐
”, second
second”,
second ob‐ En‐
En‐
Entity:
En‐
tity:
tity: Urumqi—Urumqi— ”. The
”. The output sequence of word the relationship generator isones
used to
ject: tity:
”Urumqi—
tity:
tity: Urumqi—
“Urumqi— Urumqi— ”. The
Urumqi— The ”. Theoutput
The
output
probability
”.”. The output
output sequence
output ofsequence
sequence
the
sequence
sequence of the
next of
ofofof the
the
relationship
thethe relationship
relationship
givenrelationshipallgenerator
relationship the generator
generator
previous
generator
generator is used isisisused
is used
to can
used
used to
to
extractto
to
extract the relational triples. Suppose an error occurs on the decoding side of the rela‐
be extract
extract
the to
extract
extract
used the
the
relational
the
the relational
relational
triples.
relational
relational
construct triples.
triples.
Suppose
triples.
triples.
a statistical Suppose
Suppose
Suppose
Suppose
language an error anan
ananmodelerror
error
errorerroroccurs
occurs occurs
occurs on the
occurs
[10]. Theon
on
on on the
thethedecoding
decoding
the
purpose decoding
decoding
decoding is toside
side side
of
side the
side
calculateof
of
of the
the
relationship
of the
the rela‐
therela‐
rela‐
rela‐
tionship generator and and an an entity does notnot exist informed the formed context. In In that case, wewe
tionship
generator
tionship
tionship
tionship
conditional
discard the
generator
generator
generator
generator
generation
sample
anand
and
andan
entity
and
and andoes
an
probability
continue
entity
entity
entity
entity
to
not
for does
does
does
produce
exist
does
each not
notin exist
not
sequence,
new
the
exist
exist
exist
data
in
in
ininthe the
Wthethe
until
formed
formed
context.
formed
w1 , w2 ,....
formed
the
In
n
context.
that
context.
context.
intended
w
context.: case, In
data
InInthatwe
that
sample
case,
thatdiscard
that case,we
case,
case,
is
we
we the
discardsample
discard
discard
discardthe the
the and sample
sample
thesample continue
sampleand and
and to produce
continue
continue
andcontinue
continueto to
to new produce
produce
toproduce produce data until
new
newnewnewdata the
data
data intended
datauntil until
until
untilthe the
the data
intended
intended
theintended sample
intendeddata data
datais available.
sample
sample
datasample sampleisis is
is
n
available.
available.
available.
available.
available. p  w    p  wi | w  i  , (1)
i 1
Algorithm 1: Zero-Shot-Prompt-RE for Uyghur
Considering the alphabetic weakness of the Uyghur language and the multiple ad‐
Input: Dataset = (sentence, triplets, relation types)
dition Output:
or loss Extracted
of affixes, we use
Uyghur a prompt‐based approach. Neural networks typically
Triplets-T u
generate category probabilities
Require: Uyghur data preprocessing, using a “softmax”
Ds , Du , Uglayer
Ue , Ys that
∩ Yu converts
= ∅. the logit [44], Outi
, calculated s ∩ each
1 : If Yfor Yu = category a probability, p , by comparing Outi with other log‐
K, éinto
∅
2: Suffix
ª, ºJ K,AK, èX,@XV....}
arithms. The T= is { é»,the
éÃ,Atemperature,
¯,A is the vocabulary size.
3: SELECT Entities with suffixes
exp  Out / T 
4 : Uyghur entity and  xi | xi  i (Du ,VDs ) → ( Di new_u , D
suffixpseparation , new_s )
5 : Train Ug , Dnew_s → Ug, f inetuning
6 : Train(Ue, Dnew_s ) → Ue,f inetuning
exp Out j / T    (2)
j 1
Generate Ug, f inetune , Yu → Dsynthetic

By7 :dividing
the output sequences based on the particular phrases “Context: Mr.
8 : Train U
Alim died in Urumqi.—e, f inetune , D synthetic → U g, f inal ”, first entity: Alim— ”, second En‐
∧
tity: Urumqi—
9 : Predict Ug, f inal , S”.u The
→ output
Triplet u
sequence of the relationship generator is used to
extract the relational triples. Suppose ∧ an error occurs on the decoding side of the rela‐
10 : Return Extracted Triplets Triplet
tionship generator and an entity does unot exist in the formed context. In that case, we
discard the sample and continue to produce new data until the intended data sample is
× ×
available. As can be seen from Figure 2, the second entity “ èYJm ðP ñK = èX + úm ðP ñK—In Urumqi”,
ñ K—Urumqi”, without the “
the correct result is the “ úm× ðP èX—In” suffix.

9 : Pr edict U g , final , S u   Triplet u Wikipedia [10] constructs this dataset using a snapshot
titles of Wikipedia articles are considered
Analysis for the mixed briefHindi texts, langua
where

10 : Return ?Extracted Triplets Triplet uipedia articles are used to firstbuildwaythe was associated
to use a neural lengthy docum
network
a total of 30,000 short texts and 1,747,988 long
Wikipedia [10] constructs this d documents. T
Appl. Sci. 2023, 13, 4636 DBLP [10] using the bibliography titles= ‫ﯴرۈﻤﭽﯩﺪە‬
of Wikipediadatabase. The
articles 16are consic
7 ofcor‐titles of
As can be seen from Figure 2, the second entity “ ‫ دە‬+ ‫ﯴرۈﻤﭽﻰ‬ —In Urumqi”, the
represent brief texts, whereas ipediathe abstracts
articles of alltopublished
are used build the pa
rect result is the “‫—ﯴرۈﻤﭽﻰ‬Urumqi”, without the “‫— ﺩە‬In” suffix.
the corresponding lengthy a totaldocuments.
of 30,000This short dataset
texts and consists
1,74
readers and 480,558 longDBLP records. [10] Programmable
using the bibliography web [11] ge
Place of death Alim Mr. In Urumqi Place of death Mr. Alim Urumqi
for service recommendation representand description
Object brief texts,augmentation.
whereas the abF
Subject
w 2
w3
w 4 w 5
w 6 w
an
7
wopen‐source
8 w 9 w natural
10
w 11
language
w the corresponding
12w w
13
processing toolbox
lengthy documen for extra
14
ta. They create a pipelinereaders systemand consisting

480,558 of longthree compon
records. Pr
Decoder categorizing entity modalities, and extracting relatio
entities, for service recommendation and des
w w w w w w
tionw datasetsw w
for the w
Russian w
language
w an open‐source w
are fewlanguage
natural and compri pro
1 2 3 4 5 6 7 8 9 10 11 12 13
Context: stances. Hence, D. I. Gordeev

ta. They et al.
create [13]
a chose
pipeline to develop
system con
subject Object
tes‐based sentence‐level Mr.
namedAlim entity
entities, categorizingand relationentity extraction
modaliti
Place of death Alim Mr. Urumqi In Place of death
cludes over 500 annotated tion texts and over
datasets for 5000 labeled relatio
the Russian langu
al. [14]
(a) The training stage of Uyghur present
relation generator a German‐language stances. Hence, dataset,D. I.which Gordeev is huma et a
and fine‐grained entity types and entity‐linking
tes‐based sentence‐level named enti information
Alim Place of death(P20) dataset combines ann
entities.Urumqi This first In German‐language
cludes over 500 annotated texts and
Subject: Recognition
Object: Relation:
and Relation al.Extraction.
[14] present Ahmed Hamdi et al. [15
a German‐language
w w w w
NewsEye resource, a multilingual
w w w datasetentity for named entity
8 9 10 11 12
and fine‐grained
13 14
types and en
enhanced with attitudesentities. towardsThis named first entities.
German‐languag The collec
Encoder
Encoder Decoder
historical newspaper articles Recognition published betweenExtraction.
and Relation 1850 and
w 1 w 2
w3 w 4
w 5 w 6
w 7 Finnish,
w 8 w 9
and
w Swedish.
10 w w
11 NewsEye resource, a multilingual d
12w 13
Subject: Due to the lack

Object: of low‐resource
Relation: enhanced with datasets,
attitudes research
towards on rena
Mr. Urumqi In Place of death Alim Urumqi
has mostly been restricted Place of
to the English, German, French,
historical newspaper articles publi
Swedish, Arabic, and Hindi Finnish, languages
and Swedish. [11]. In this study, w
(b) The training stage of tion
Uyghurextraction
relation detector method for a Due low‐resource
to the lacklanguageof low‐resource (Uyghu
new kinds that were nothas previously stated or observed. De
Figure
Figure 2.2.TheTheUyghur
Uyghurlanguage
languagetraining
training process
process forfor Relation
Relation Generator.mostly been restricted to the En
Generator.
training examples, including the test relation
Swedish, Arabic, and Hindi languag labels, our met
3.3. Relation Detector of the subject, object, andtion relation
extractiontype from method eachfor sentence.
a low‐re T
example and overview ofnew taskkindssettings that forwere
easynot comparison.
previously T
In this section, we use a sequence-to-sequence approach regarding NRE as a novel
to our knowledge, extends trainingrelation triplet
examples, extraction to a ze
labeling problem, generated with greedy decoding [44–47]. We may create modelincluding
outputs the tes
datasets. As a result,
without any initial decoder input to predict a single relation triplet (subject, object, relation we suggest
of the Relation
subject, Prompt,
object, which red
and relation typ
the production of
type) in a given sentence, s. If an object or relation type is invalid, we regard it as a nullnew data
example for Uyghur
and languages.
overview of The
task fund
setting
prediction for that sample. As shown tional prompts to1, motivate
in Algorithm to our
the prerequisite a language
knowledge,
is that model extends
the to of
set produce
relationa
capable of expressing the
visible relational types, Ys , and the set of invisible relational types, Yu , cannot intersect. synthet necessary
datasets. As relations.
a result, weThesesuggest Rel
train another
Ue is fine-tuned for the labeled dataset, Dnew_s , (line 6) after processing process of extracting
the production a model
basedofon of
new the
Uyghur triplet
data for Uyg [16].
entities and entity suffixes (line 1), and then reformulated on the tional
syntheticprompts to motivate
sentences (line 8). a langu
Table 1. Task examples of triplet capable extraction.
of expressing ∧ the necessary
The test sentences, Su , are then applied to predict and extract relation triplets, Tripletu (lines
Sentence train another process of extracting a
Subject
8 and 9). Our relationship extractor is not the same as a traditional relationship extractor,
which is used as a prompt input sentence: “ ” in
Table 1. the
Taskform of “Context:
examples of triplet extracti
s”. Additionally, the structured output form Mr. is “subject:
Alim ehead
died in , object: etail , Relation: y”
Urumqi. with
Mr. Alim
a single pair of entities, ehead and etailIn
, meeting the relation type, y. Sentence
Table 1, Sentence, Subject, and Object refer to the sentence, firs
spectively.
Our method (Zero-Shot-Prompt-Uyghur-RE) accepts “Context: ,
subject: ehead , object: etail , Relation:” as decoder input and produces y as relation type.
Mr. Alim died in Urumqi.
As a result, because this modificationTable 2. Task parameters
influences against our
model prediction,
In Tableourproposed Zero‐Shot
1, solution
Sentence, naturally
prompts r
Subject, and Objec
includes Zero-Shot-Prompt-RE. Task Setting spectively. Input O
3.4. Extracting Uyghur Multiple Triplets Relation ClassificationTable S2.entence, entityhead , entity
Task parameters against
tail our pro
Uyghur is a multi-featured language, so in this section, we also suggest a generation

Task Setting
decoding approach to increase zero-shot extraction performance for phrases with numerous
triplets. For the Relation Prompt Production of synthetic data, Relation contains a Sentenc
Classification
each sample
single relation triplet [48–53]. Therefore, traditional relation extraction models are likely
to perform poorly with our framework for multi-triplet Zero-Shot-Prompt-RE since they
often presume that training data may include several triplets per sentence. There is a multi-
turn question-answering inference approach [44] that may alleviate this issue. As a result,
we offer Triplet Search Decoding, which enhances the relation extractor’s multi-triplet
Zero-Shot-Prompt-RE [54].
tion
newdecoding
kinds that approach
were not to previously
increase zero‐shot
stated orextraction Despite the for
observed.performance lackphrases with
of annotated
numerous triplets. For the Relation Prompt Production of synthetic data, each contains a single relation triplet [48–53]. Ther
sample
training examples, including the test relation labels, our method aims to extract triplets
contains a single relation triplet [48–53]. Therefore, traditional relation extraction are likely to perform poorly with our framew
models
of the subject, object, and relation type from each sentence. Tables 1 and 2 provide a task
are likely to perform poorly with our framework for multi‐triplet since they often presume that training data
example and overview of task settings for easy comparison. This isZero‐Shot‐Prompt‐RE
the first research that,
since they often presume that training data extraction
may include There is a multi‐turn question‐answering infe
to our knowledge, extends relation triplet to aseveral
zero‐shot triplets perfor
setting sentence.
Uyghur
ThereAppl. Sci. 2023, 13,
is a multi‐turn 4636 issue. As a result, we offer Triplet 8 ofSearch
16 Dec
datasets. As a result,question‐answering
we suggest Relationinference
Prompt, approach [44] thatthe
which redefines may alleviate
zero‐shot thisas
task
issue. As a result,ofwe offer Triplet Search languages.
Decoding, which enhances the relation tor’s multi‐triplet Zero‐Shot‐Prompt‐RE [54].
extrac‐
the production new data for Uyghur The fundamental idea is to use rela‐
tor’s multi‐triplet Zero‐Shot‐Prompt‐RE [54]. When using a traditional relationship ex
tional prompts to motivate a language model to produce an artificial training sample
When put and outputs a relationship category in
capable ofusing a traditional
expressing relationship
the necessary When extractor
using
relations. [16] that takes
a traditional
These synthetic dataa sentence
relationship cue
extractor
can then as[16]
be used in‐tothat takes a sentence cue as input
put and outputs a relationship category
and inofan auto‐regressive manner,
in an multiple tionships cannot be resolved.
rela‐ manner, multiple relationships Although our
train another process of extracting a outputs
model athe
relationship
triplet [16].category auto-regressive
tionships cannot be resolved. Although cannot be our triplet search
resolved. decoding
Although can produce
our triplet lines, each of which corresponds
many can produce many lines, each
search decoding to a distinct
lines, each of which corresponds of
to
Table 1. Task examples of triplet extraction. which
a corresponds
distinct to a
candidate‐relateddistinct candidate-related
triplet, we then filterfinal
the output
triplet, we thensequences
filter theusing
final aoutput
probability th
final output sequences using a probability sequences using a probability
threshold. Startingthreshold.
from the Starting from
entity sequence “subject:
the entity ”, sequence
it follows“subject:
from our”,template
it fol- that
Sentence lows from our Subject
template that the next
“subject: ”, it follows from our template that the next generated token should be the first generated Object
token token
should of
bethe
the subject,
first such
token of as
the“Alim—
subject, ”. Th
token of the subject, such as “Alim— such as ”. The probability
“Alim— ”. Thefor the i th potential
probability for the token theofsubject
ith potential is represented
token of the subject p repre-
as is e head ， i  . The
the subject
Mr. Alimis represented as p  e head
died in Urumqi. ， i  .as
sented The
p(especific
Mr.
head . sub‐sequence
, i )Alim “ object:In eUrumqi
The specific sub-sequence “ object:
tail “ is then etail after
formed ” is then formed
greedy after greedy
decoding of the sequenc
decoding referoftothe sequence. The of first × ñ K”,next. The
formed
In Tableafter greedy Subject,
1, Sentence, decodingandofObject
the sequence. The
the first token
sentence, first thetoken
entity, second of the
and second entity,second
such
entity, entity, such‫ﯴرۈﻤﭽﻰ‬
as “Urumqi—
re‐ as “Urumqi— úm come
,” will then ðP
asspectively.  etail , jThe
The pnext. | eheadp,i e represents of the initial token for the j th tail entity. As

“Urumqi— ‫ﯴرۈﻤﭽﻰ‬,” will then come will next. the probability
then come tail,j ehead,i represents the probability of the initial token for
ofTable
the initial
2. Task token for the
parameters j th our
against tail
the entity.
jth tailAsentity.
proposed a result,
As prompts
Zero‐Shot athe generation
result, is
to branched
the generation
retreated
so there
tripletisextraction.
branched thatsowill
thatbe an additional
there b sequence p  et
will be an additional
there will be an additional b sequence
b sequence p  ep e | eheade,ihead,i
 for each headhead
for each entity. The The
entity. special
special sub-sequence
sub‐sequence “Relation:”“Relation:” is
is then generated a
Task Setting Inputtail , j tail,j Output Supervision
sub‐sequence “Relation:” is thenthen generated
generated afterafter
greedy greedy decoding
decoding of theof sequence.
the sequence.
The The
initial initial
token of token of the relation
the relation label, such as
Relation Classification Sentence, entityhead , entitytail label Full
label,
initial token of the relation label, suchasas““death—
such death— “ Place
”“inin“Place of death—
of death— ”, will
,” will be the next
next token
token to be ge
to
,” will be the next tokenbe generated.
to be generated. We branch the generation so
We branch the generation so that that there
there will
willbebeananadditional
additional b sequence
b sequence p y
p  yktail,j
| ehead ,i , etail , j  for each pair of subjects objects.

p
there will be an additional b sequence y e
k head,i , e for each pair of subjects and objects.
and We only examine the likelihood of
We only examine the likelihood of the
the first token because it can primarily
objects. We only examine the likelihood of the first token because it can primarily predict predict the entity’s subsequent
the entity’s subsequenttokenstokens
[55–57]. The The en
[55–57].
entire inference probability is represented as:
the entity’s subsequent tokens [55–57]. The entire inference probability is represented as:
p(tripleti,j,k ) = p(entityhead,i , entitytail,j , yk ) = p(yk |entityhead,i , entitytail,j )
(3)
· p(entitytail,j |entityhead,i ) · p(entityhead,i )
Unlike other decoding methods, such as beam search, triplet search decoding takes
advantage of the particular triplet structure.
4. Experiments
4.1. Datasets
Uyghur Relation Extraction datasets for Zero-Shot-Prompt-RE are created by a dis-
tantly supervised Wikidata knowledge base and Wikipedia articles, as can be seen in Table 4.
We use the same methodology as [3,50,58] to divide the data into known and unknown
label sets randomly.
Table 4. Datasets statistics in Uyghur languages.
Sentence-Length Entities Samples Relations

UyRE-ZSP 20.8 20,742 23,652 24
In our experiments, we have compared different sets of unknown relation types of size
m ∈ {5, 10} to confirm the effect of the size of the set of relation types on the experimental results.
4.2. Experimental Settings

Based on Uyghur Relation Extraction for Zero-Shot-Prompt-RE (UyRE-ZSP), the l-r
is set at 3 × 10−4 and a batch size of 64. We employ the AdamW optimizer during the
training procedure [45]. For each relationship, a set number of phrases will be generated.
We fine-tune the pre-trained mBERT [38] for Uyghur, the relation extractor.
To evaluate ZSP RTE, we test triplet extraction separately for phrases with a single
triplet and sentences with several triplets. To assess multiple triplet extraction, we use
the Micro F1 metric, which is often employed in structured pre-diction tasks [51] and
provides precision (P) and recall (R). As each phrase contains just a single possible triplet,
the Accuracy metric is used to evaluate single triplet extraction (Acc.). We examine UyRE
ZSP by [54] using the Macro F1 measure.
To evaluate the model’s performance for relational triple extraction [28], we introduced
Precision (Prec.), Recall (Rec.), and F1-measures as evaluation criteria:
Appl. Sci. 2023, 13, 4636 9 of 16
TP
P= × 100 (4)
TP + FP
TP
R= × 100 (5)
TP + FN
2PR
F1_score = × 100 (6)
P+R
where TP indicates the total number of positive and anticipated positive labels and TN
represents the number of tags that are negative and expected to be negative, FP represents
the number of brands projected to be positive but negative. The number of positive but
expected to be harmful labels is denoted by the symbol FN [28].
4.3. Main Experimental Results

Triplet Extraction: Table 5 compares Relation Prompt-Uyghur to the baselines on Zero-
Shot-Prompt-Relation Triplet Extraction (ZSP_RTE) for the UyRE-ZSP datasets. Regarding
accuracy and F1 metrics, our method consistently outperforms the baseline methods in
single-triplet and multi-triplet evaluations. Uyghur is one of the most morphologically
complex and adhesive languages, in which the composition of words and their meanings
rely on complex forms of affixation to represent them. The affixes not only change the root
word’s importance but also determine a word’s role in a sentence. Therefore, the correct
separation of stems and affixes is necessary to represent the word’s true meaning as a whole
correctly. Consequently, it requires extra processing of data; additional letters will be added
or lost when the word root is extracted; this should be handled according to the Prompt to
avoid the case of an extraction error.
Table 5. Results for our method in different relation types in Uyghur languages.
Single Triplet Multi Triplet

Method
Acc. Prec. Rec. F1
Table Sequence [51] 11.73 50.00 4.76 8.69
NoGen [10] 8.32 83.33 6.17 11.49
Relation Prompt [6] 13.65 71.25 26.51 38.64
m=5
Zero-Shot-RTE for Uyghur 13.28 73.97 30.17 42.86
Zero-Shot-Prompt-RTE for Uyghur
16.35 69.23 35.29 46.75
(Our suggested model)
Table Sequence [51] 9.11 50.00 2.38 4.54
NoGen [10] 6.58 75.00 5.83 0.11
Relation Prompt [6] 14.56 71.43 9.26 16.39
m = 10
Zero-Shot-RTE for Uyghur 11.54 76.25 31.69 44.77
Zero-Shot-Prompt-RTE for Uyghur
16.48 78.46 33.95 47.39
We have conducted comparative experiments between Zero-Shot-Prompt-RE and the

baseline method in Table 5. Our method consistently outperformed the baseline method in
terms of accuracy and F1 metrics in both single triad and multiple triplet results [10,51]. We
also note that Zero-Shot-Prompt-Relation Extraction (ZSP_RE) in Uyghur performs much
better than a third-party Table Sequence [50–52], especially in multi-trigram extraction,
suggesting that the Zero-Shot-Prompt-RE approach in Uyghur is practical. However, our
proposed Uyghur entity and suffix processing, relation extractor, and decoder technique
successfully resolve this difficulty by automatically listing and arranging many triples
accurately at the time of inference, without the extracted entities carrying a suffix, as in
previous work [39,44].
Relation Extraction: Zero-Shot-Prompt-RE for Uyghur includes information about en-
tity pairs in the hints and supports zero-shot relation classification work without additional
training. Table 6 shows that, compared to previous state-of-the-art ZS-BERT techniques [12],
Appl. Sci. 2023, 13, 4636 10 of 16
our approach maintains relatively good F1 performance for our relational classification
when the set of unseen relational labels increases. It may be possible to scale better to larger
sets of unseen tags based on relational hints, which is very friendly for applications of
relational triad extraction in open domains. The reason for the low recall is that when the
samples are labeled, most positive samples are mistakenly labeled as negative samples.
This makes it harder for the model to learn from positive examples, which lowers the
accuracy of negative samples while positive samples are predicted as negative samples. To
solve the problem, the next step is to fix the samples that were wrongly labeled and watch
the results to see if anything has changed.
Table 6. Results for our suggested method in Uyghur Zero-Shot-Prompt-Relation Extraction.
Method Prec. Rec. F1

R-BERT [51] 88.40 13.40 23.28
ZS-BERT [3] 73.81 32.63 45.25
m=5 Relation Prompt [6] 66.67 53.84 59.57
Zero-Shot-Prompt-RE for Uyghur
70.89 57.55 63.35
R-BERT [51] 88.73 12.67 22.18
ZS-BERT [3] 74.32 30.55 43.30
m = 10 Relation Prompt [6] 82.25 38.75 52.68
Zero-Shot-Prompt-RE for Uyghur
82.02 48.39 60.87
5. Analysis
5.1. Ablation Study
The generated data’s accuracy is crucial to our Zero-Shot-Prompt-Relation Triplet
Extraction (ZSP_RTE) technique’s success. Therefore, we contrast a variety of genuine and
artificial data sets. We count the number of distinct words and other items in the texts. The
accuracy of the data generated is critical to the success of our Zero-Shot Prompt-Relation
Triplet extraction technique. Therefore, we compared the kinds of natural and artificial data
samples and performed comparative experiments. Table 7 shows that the synthetic data
sentences have more entities with a variety of unique components. However, the generated
sentences have fewer individual words and less lexical diversity.
Table 7. Data diversity comparison.
Model Sentences Entities Words

Our Data 2451 2859 10,356
Generated Data 2451 3122 7452
However, the generated sentences have a lesser vocabulary diversity. On the other
hand, non-entity words account for the majority of the total uncommon words.
Our suggested Zero-Shot-Prompt-Relation Triplet Extraction approach relies signifi-
cantly on the quality of the produced data. Thus, we examine a variety of samples of actual
and fake data. Specifically, we count the unique words and items in the texts. We used
the UyRE-ZSP validation set to produce five-label sentences and an equivalent number of
synthetic phrases for comparison. Table 7 demonstrates that the created sentences include
various unique items.
Nevertheless, the resulting phrases include fewer unique words overall, which may
be explained by the fact that entity names are often unique. The generator language model
has been exposed to many unique entity names during extensive pre-training. On the other
hand, non-entity words account for the majority of the overall number of unique words.
By employing prompts to condition the production of sentences explicitly for unknown
connection labels, the output sentences may include less context-specific information.
Appl. Sci. 2023, 13, 4636 11 of 16
Appl. Sci. 2023, 13, x FOR PEER REVIEW Table 8 shows the results of an ablation study that was performed to test how well 12 ofour
17
decoding method and task-specific fine-tuning worked for the Uyghur visible relation set
on the multi-triple recombination Zero-Shot-Prompt-Relation Extraction. The comparison
was conducted on the UyRE-ZSP validation set, which contains ten invisible identifiers.
sential
This for performance
large Zero‐Shot‐Prompt‐Relation Triplet
disparity indicates Extraction
that for Uyghur
triplet search with
decoding is multiple
essential tri‐
for
plets, implying that the enumeration
Zero-Shot-Prompt-Relation and classification
Triplet Extraction for Uyghurof associative
with multiple triplet
triplets,candidates
implying
are
that of
theadequate quality.
enumeration andSecond, we observe
classification that thetriplet
of associative relation extractor’s
candidates areperformance
of adequate
quality. Second, we observe that the relation extractor’s performance degradesset
degrades significantly when the visible relation samples from the column are not fi‐
significantly
when the visible relation samples from the column set are not fine-tuned prior to invisible
ne‐tuned prior to the final fine‐tuning of the synthetic samples generated by the the final
labels.
fine-tuning of the synthetic samples generated by the invisible labels.
Table 8.
Table 8. Ablation
Ablation results
results of
of Uyghur
Uyghur multi-triplet
multi‐triplet extraction.
extraction.
Model
Model F1 F1 4△F1
F1
Full method 26.23
Full method 26.23
Uyghur Triplet SearchDecoding
Uyghur Triplet Search Decoding 12.33
12.33 −13.9
−13.9
Extractor fine‐tuning
Extractor fine-tuning 14.54
14.54 −11.69
−11.69
When
When the
the size
size of
ofthe
theunseen
unseenlabel set, m
labelset, m,, increases,
increases, our
ourtechnique
techniquecan
canretain
retaina arela‐
rel-
tively
atively good classification F1 performance, while Zero-Shot-Prompt-RE exhibits aa more
good classification F1 performance, while Zero‐Shot‐Prompt‐RE exhibits more
pronounced decline
pronounced decline in
in performance.
performance.
5.2. Qualitative Analysis

Figure 33 shows
shows aavariety
varietyofofreal
realand
andcreated
created samples
samples to to demonstrate
demonstrate howhow
thethe rela‐
relation
tion
data data generator
generator generalizes
generalizes to real‐world
to real-world relations.
relations. Real sentences
Real sentences and related
and related labelslabels
were
compiled
were from factual
compiled articles.
from factual articles.
Relation Genarated Sentence Real Sentence
Ms. Li bought a lot of toys for her Ms. Zhang went to Nana's school to
Mother daughter, Yushin. learn about her study.
‫ﯪﭘﯩﺴﻰ‬ .‫ﻠﻰ ﺧﺎﻧﯩﻢ ﻗﯩﺰى ﻳﯜﺷﯩﻨﻐﺎ ﺟﯩﻖ ﯮﻳﯘﻧﭽﯘﻗﻼ رﻧﻰ ﯸﻟﯩﭗ ﺑ ردى‬ ‫ ﯰﻧﯩﯔ ﯲﮔﯩﻨﯩﺶ‬،‫ﺟﺎڭ ﺧﺎﻧﯩﻢ ﻗﯩﺰى ﻧﺎﻧﺎﻧﯩﯔ ﻤ ﻜـﺘﯩﭙﯩﮕ ﺑﭕﺮﯨﭗ‬
Alimjan married Rana this month. .‫ﯬﮬﯟاﻠﯩﻨﻰ ﯸﮕ ﻠﻟﯩﺪى‬
Spouse .‫ﺳﯜﺑﮭﻰ ﻤﯘﺷﯘ ﯪﻳﺪا رەﻧﺎ ﺑﯩﻟ ن ﺗﻮي ﻗﯩﻟﺪى‬ Mr. Cao went on a trip with Mrs. Li
Sarwar was born in Urumqi in 90. Huilan.
‫ﺟﯚرﯨﺴﻰ‬
. ‫ ﻳﯩﻟﻰ ﯴرۈﻤﭽﯩﺪە ﺗﯘﻏﯘﻠﻐﺎن‬90 ‫ﺳﺎۋ ﯬﭘ ﻧﺪى ﺧﺎﻧﯩﻣﻰ ﻠﻰ ﺧﯜﻳﻟ ن ﺑﯩﻟ ن ﺳﺎﻳﺎﮬ ﺗﻛ ﭼﯩﻘﯩﭗ ﯮﻳﻨﺎپ ﺳ رۋەر‬
Place of .‫ﻜ ﻠﺪى‬
birth ‫ﺗﯘﻏﯘﻠﻐﺎن‬ Mr .Li was born in1998 in Guangzhou
‫ﻳﯘ رﺗﻰ‬ ,Guangdong Province .
‫ ﻳﯩﻟﻰ ﮔﯘاﯕﺪۇڭ ﯲﻠﻛﯩﺴﻰ ﮔﯘاﯕﺠﯘ ﺷ ﮬﯩﺮﯨﺪە‬1998 ‫ﻠﻰ ﯬﭘ ﻧﺪى‬
.‫ﺗﯘﻏﯘﻠﻐﺎن‬
Figure 3. Comparison of real and generated data.

Figure 3. Comparison of real and generated data.
Given the relations “Mother— úæJKAK”, “Spouse—JPñk . ”, and “Place of birth—ùKAg. àAªËñ«ñK.”,
Given the
the generator canrelations “Mother—
generate logical ‫”ﯪﭘﯩﺴﻰ‬, It“Spouse—
statements. ‫”‐ﺟﯚرﯨﺴﻰ‬,the
can also discover and “Place
proper of birth—
semantic ‫ﺗﯘﻏﯘﻠﻐﺎن‬
meaning of
‫ـﻰ‬the
‫ﺟﺎﻳـ‬.”, the generator can generate logical statements. It can also discover the
relations. To test the generality of the relational data generator to relationships in the proper se‐
manticwe
wild, meaning
provideofseveral
the relations. To test
actual and the generality
created examples in of Figure
the relational data generator
3. The connection labelsto
relationships
and actual phrases were culled from articles of fact. With the relations “Mother— úæJK AK3.
in the wild, we provide several actual and created examples in Figure ”,
The connection labels and actual phrases were culled from articles of fact. With the rela‐
“Spouse—JP ñk.“, and “Place of birth—- ùK Ag. àAªËñ«ñK.”, the generator can detect the cor-
tions “Mother—‫”ﯪﭘﯩﺴﻰ‬, “Spouse—‫“ ‐ﺟﯚرﯨﺴﻰ‬, and “Place of birth—‫ﺗﯘﻏﯘﻠﻐﺎن ﺟﺎﻳــﻰ‬.”, the generator
rect semantics
can detect theofcorrect
the relations and generate
semantics grammatically
of the relations sound phrases.
and generate The produced
grammatically sound
head and tail entity pairs may often fit the specified relations and
phrases. The produced head and tail entity pairs may often fit the specified relations have similar contexts
and
to the actual phrases. In the second instance of the connection
have similar contexts to the actual phrases. In the second instance of the connection ‘spouse’, however, the
created entity
‘spouse’, pairings
however, thedocreated
not fit the relationship’s
entity pairings do intended
not fitmeaning despite the intended
the relationship’s marriage
relationship. In contrast, the connection implied by the produced words is more akin to
meaning despite the marriage relationship. In contrast, the connection implied by the
produced words is more akin to “get married”. These results imply that a potential area
for future development may be to make the head and tail entities created by the system
correspond more closely to the specified connection.
Appl. Sci. 2023, 13, 4636 12 of 16
“get married”. These results imply that a potential area for future development may be
to make the head and tail entities created by the system correspond more closely to the
specified connection.
5.3. Effect of Generated Data Size

We look into how the number of artificial samples affects how well the multi-triplet
Zero-Shot-Prompt-Relation Triplet Extraction for Uyghur works.
Our Zero-Shot-Prompt-Relation Triplet Extraction method depends heavily on the
quality of the generated data. Therefore, we compare the diversity of real and synthetic
data samples. We measure the number of unique words and entities in the text. We used
UyRE-ZSL validation set sentences with five unique tags and generated an equal number
of synthetic sentences for comparison. Figure 3 shows that the diversity of unique entities
is more significant for the generated sentences. However, the overall unique vocabulary
diversity of the generated sentences is lower. These results may be because entity names are
unique, and the generator language model has seen many unique entity names in large-scale
pre-training. On the other hand, the overall unique vocabulary is mainly determined by
the non-entity vocabulary. The diversity of contextual information in the output sentences
can be constrained by generating sentences specifically for unseen relational labels using
cueing conditions.
Table 9 shows the results of the evaluation, which is based on a UyRE-ZSP validation
set with five and ten labels that are unseen. The F1 score goes up when the number of
samples per label goes from 100 to 1500. However, increasing the generated capacity to
a maximum of 1500 does not improve efficacy. This suggests that while synthetic data
are advantageous for Zero-Shot-Prompt-RE for Uyghur, excessive quantities can result in
overfitting due to noise.
Table 9. Effect of Uyghur generated data size On UyRE-ZSP.
Generated Samples F1
100 19.56
200 23.24
m=5 500 26.59
1000 27.54
1500 24.25
100 23.15
200 26.54
m = 10 500 27.42
1000 25.25
1500 24.12
5.4. Performance across Relations

To study how the performance varies across distinct relations labels, we evaluate
single-triplet Zero-Shot-Prompt-RE for Uyghur on the UyRE-ZSP test set with m = 5, 10
unseen tags, respectively. Table 10 shows that the model can perform well for relations
@ð —place of death”.
such as “sister— ÉJºJ. Ag AK” and “ ùK Ag. àAªËñK. HAK
However, it functions more inadequately for connections such as “Part of— ùÒ¯
QK”
.
—nationality.”
and “ úæJÊÊJÓ
In order to investigate how the performance varies between different relationship
labels, we evaluated the single triple ZeroRTE with five and ten invisible labels on the Wiki-
ZSL test set. As can be seen in Table 10, the model performs well for m = 5 relations such
as “Educated at— àéÃP ñ JKñK
úæK éJ»éÓ” and “place of death”, the model performs well for
QK” and “Award received— àéº
relations such as “Part of— ùÒ¯ Q.K A® KAK A¿ñÓ” The model
.

performs well for relations such as “Part of— ùÒ¯ QK.” and “nationality— ú¾JËèðé K HéË
ðX”
Appl. Sci. 2023, 13, 4636 13 of 16
for m = 10, and worse for relations such as “Part of— ùÒ¯
QK” and “Award received-
Q.K A® KAK A¿ñÓ” for m = 10. It performs worse for relations such as “Part of— ùÒ¯ QK.”
àéº
and “nationality— ú¾JËèðé K HéË
ðX ”. These results suggest that Relation Prompt performs
best for concrete relations to constrain the output context more effectively.
Table 10. Separate evaluation on relation labels.
Relation Labels F1
Part of 13.65
nationality 23.21
spouse 25.65
m=5 Educated At 39.87
Award Received 15.63
sister 28.25
Place of death 42.68
Part of 11.23
nationality 25.63
spouse 36.52
m = 10 Educated At 38.23
Award Received 39.25
sister 41.58
Place of death 44.32
6. Conclusions
This paper describes the ‘Zero-Shot-Prompt-RE relational triple extraction for Uyghur’
job setting, which was made to get around the main problems with earlier task settings
and encourage more research in low-resource relation extraction. To deal with Zero-
Shot-Prompt-RE relational triple extraction tasks, we offer RelationPrompt and show that
language models can use relation label prompts to create synthetic training data that can
be used to create structured sentences. We present an effective and interpretable triplet
search decoding approach to address the restriction of extracting many related triplets
from a phrase. Using a multi-task learning structure and contextual representation learning
quality based on the features of Uyghur entity words, cue learning embeds the input
sentences well into the embedding space and significantly improves the performance. We
also conducted extensive experiments to investigate different aspects of ZS-BERT, ranging
from hyperparametric examples to case studies, ultimately showing that ZSP-RTE can
consistently outperform existing relational extraction models in a zero-shot setting. In
addition, learning effective relational embeddings using relational prototypes as auxiliary
information may also help semi-supervised learning or few-shot learning.
In the future, we plan to apply this method to Knowledge Graph applications, includ-
ing question answering, text classification, and event extraction.
Author Contributions: Conceptualization, A.H.; methodology, A.H.; software, A.H.; validation,

A.H.; formal analysis, A.H. and T.Y.; investigation, A.H.; data curation, A.H.; original draft prepa-
ration, A.H.; writing, review, and editing, A.H., A.W. and T.Y.; supervision, A.W. and T.Y.; project
administration, A.H. All authors have read and agreed to the published version of the manuscript.
Funding: This research was supported by the Natural Science Foundation of Xinjiang Uyghur
Autonomous Region of China (Grant No. 2021D01C079) and the National Key Research and Devel-
opment Program of China (Grant No. 2017YFB1002103).
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.
Appl. Sci. 2023, 13, 4636 14 of 16
Acknowledgments: The authors would like to thank all anonymous reviewers and editors for their
valuable suggestions to improve this paper.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Huang, S.; Liu, J.; Korn, F.; Wang, X.; Wu, Y.; Markowitz, D.; Yu, C. Contextual fact ranking and its applications in table synthesis
and compression. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
(KDD), Anchorage, AK, USA, 4–8 August 2019; pp. 285–293.
2. Zhang, Y.; Qi, P.; Manning, C.D. Graph convolution over pruned dependency trees improves relation extraction. In Proceed-
ings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 2–4 November 2018;
pp. 2205–2215.
3. Chen, C.; Li, C. Zs-bert: Towards zero-shot relation extraction with attribute representation learning. In Proceedings of the
Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,
Online, 6–11 June 2021; pp. 3470–3479.
4. Mintz, M.; Bills, S.; Snow, R.; Jurafsky, D. Distant supervision for relation extraction without labeled data. In Proceedings of
the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language
Processing of the AFNLP, Singapore, 2–7 August 2009; pp. 1003–1011.
5. Riedel, S.; Yao, L.; McCallum, A. Modeling relations and their mentions without labeled text. In Proceedings of the Joint European
Conference on Machine Learning and Knowledge Discovery in Databases, Bercelona, Spain, 19–23 September 2010; pp. 148–163.
6. Chia, Y.; Bing, L.; Soujanya, P.; Luo, S. RelationPrompt: Leveraging Prompts to Generate Synthetic Data for Zero-Shot Relation
Triplet Extraction. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland,
22–27 May 2022; pp. 45–47.
7. John, K.; Ilias, C.; Prodromos, M.; Ion, A. GREEK-BERT: The Greeks visiting Sesame Street Association for Computing Machinery.
In Proceedings of the 11th Hellenic Conference on Artificial Intelligence, New York, NY, USA, 2 September 2020; pp. 110–117.
8. Pundlik, S.; Dasare, P.; Kasbekar, P.; Gawade, A.; Gaikwad, G.; Pundlik, P. Multiclass classification and class based sentiment
analysis for hindi language. In Proceedings of the 2016 International Conference on Advances in Computing, Communications
and Informatics (ICACCI), Jaipur, India, 21–24 September 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 512–518.
9. Mukesh, Y.; Varunakshi, B. Semi-supervised mix-hindi sentiment analysis using neural network. In Proceedings of the 9th
International Conference on Cloud Computing, Data Science & Engineering (Confluence), IEEE, Noida, India, 10–11 January
2019; pp. 309–314.
10. Tang, J.; Wang, Y.; Zheng, K.; Mei, Q. End-to-End Learning for Short Text Expansion. In Proceedings of the 23rd ACMSIGKDD
International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017; pp. 1105–1113.
11. Shi, M.; Tang, Y.; Liu, J. Functional and Contextual Attention-Based LSTM for Service Recommendation in Mashup Creation. In
IEEE Transactions on Parallel and Distributed Systems, Piscataway, NJ, USA, 6–8 June 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–10.
12. Cheng, F.; Yada, S.; Tanaka, R.; Aramaki, E.; Kurohashi, S. JaMIE: A Pipeline Japanese Medical Information Extraction System.
arXiv 2021, arXiv:2111.04261. [CrossRef]
13. Gordeev, D.I.; Davletov, A.A.; Rey, A.I.; Akzhigitova, G.R.; Geymbukh, G.A. Relation extraction dataset for the russian. In
Proceedings of the International Conference on Computational Linguistics and Intellectual Technologies “Dialogue”, Seoul,
Republic of Korea, 20–22 July 2020; pp. 1–13.
14. Hennig, L.; Truong, P.T.; Gabryszak, A. MobIE: A German Dataset for Named Entity Recognition, Entity Linking and Relation
Extraction in the Mobility Domain. arXiv 2021, arXiv:2108.06955.
15. Hamdi, A.; Pontes, E.L.; Boros, E.; Nguyen, T.T.H.; Hackl, G.; Moreno, J.G.; Doucet, A. A Multilingual Dataset for Named
Entity Recognition, Entity Linking and Stance Detection in Historical Newspapers. In Proceedings of the SIGIR ‘21: The 44th
International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual, 11–15 July 2021.
16. Xin, X.; Chen, X.; Zhang, N.; Xie, X.; Chen, X.; Chen, H. Towards Realistic Low-resource Relation Extraction: A Benchmark with
Empirical Baseline Study. arXiv 2022, arXiv:2210.10678. [CrossRef]
17. Rami, A.; Andreas, V.; Ryan, M. Leveraging type descriptions for zero-shot named entity recognition and classification. In
Proceedings of the ACL, Bangkok, Thailand, 1–6 August 2021; pp. 1516–1528.
18. Yoshua, B.; Réjean, D.; Pascal, V.; Christian, J. A neural probabilistic language model. In Proceedings of the Fourteenth Annual
Neural Information Processing Systems (NIPS) Conference, Denver, CO, USA, 27 November 27–2 December 2000; pp. 932–938.
19. Cui, L.; Wu, Y.; Liu, J.; Yang, S.; Zhang, Y. Template-based named entity recognition using BART. In Proceedings of the Findings
of ACL-IJCNLP, Bangkok, Thailand, 1–6 August 2021; pp. 1835–1845.
20. Wei, Z.; Su, J.; Wang, Y.; Tian, Y.; Chang, Y. A Novel Cascade Binary Tagging Framework for Relational Triple Extraction. In
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 1476–1488.
21. Ye, H.; Zhang, N.; Deng, S.; Chen, M.; Tan, C.; Huang, F.; Chen, H. Contrastive Triple Extraction with Generative Transformer.
In Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Virtual Event, 2–9 February 2021;
pp. 14257–14265.
Appl. Sci. 2023, 13, 4636 15 of 16
22. Omer, L.; Minjoon, S.; Eunsol, C.; Luke, Z. Zero-shot relation extraction via reading comprehension. In Proceedings of the CoNLL,
Vancouver, BC, Canada, 3–4 August 2017; pp. 333–342.
23. Wang, Y.; Yu, B.; Zhang, Y.; Liu, T.; Zhu, H.; Sun, L. TPLinker: Single-stage Joint Extraction of Entities and Relations through
Token Pair Linking. In Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, 8–13
December 2020; pp. 1572–1582.
24. Wang, Y.; Sun, C.; Wu, Y.; Zhou, H.; Li, L.; Yan, J. UniRE: A Unified Label Space for Entity Relation Extraction. In Proceedings of
the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural
Language Processing, Online, 1–6 August 2021; pp. 220–231.
25. Yu, B.; Zhang, Z.; Shu, X.; Liu, T.; Wang, Y.; Wang, B.; Li, S. Joint Extraction of Entities and Relations Based on a Novel
Decomposition Strategy. In Proceedings of the ECAI 2020—24th European Conference on Artificial Intelligence, Santiago De
Compostela, Spain, 29 August–8 September 2020; pp. 2282–2289.
26. Abiola, O.; Andreas, V. Zero- shot relation classification as textual entailment. In Proceedings of the First workshop on fact
extraction and VERification, Brussels, Belgium, 1 November 2018; pp. 72–78.
27. Laria, R.; Kyle, M. Prompt programming for large language models: Beyond the few-shot paradigm. In Proceedings of the CHI,
Yokohama, Japan, 8–13 May 2021; pp. 1–10.
28. Ding, B.; Liu, L.; Bing, L.; Canasai, K.; Thien, H.; Shafiq, J.; Luo, S.; Chunyan, M. DAGA: Data augmentation with a generation
approach for low-resource tagging tasks. In Proceedings of the EMNLP, Online, 16–20 November 2020; pp. 6045–6057.
29. Han, X.; Zhu, H.; Yu, P.; Wang, Z.; Yao, Y.; Liu, Z.; Sun, M. Fewrel: A large-scale supervised few-shot relation classification dataset
with state-of-the-art evaluation. In Proceedings of the EMNLP, Brussels, Belgium, 31 October–4 November 2018; pp. 4803–4809.
30. Das, R.; Nelakantan, A.; Belanger, D.; McCallum, A. Chains of reasoning over entities, relations, and text using recurrent neural
networks. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics,
Valencia, Spain, 3–7 April 2017; pp. 132–141.
31. Huang, L.; Ji, H.; Kyunghyun, C.; Ido, D.; Sebastian, R.; Clare, V. Zero-shot transfer learning for event extraction. In Proceedings
of the ACL, Melbourne, Australia, 15–20 July 2018; pp. 2160–2170.
32. Ji, G.; Liu, K.; He, S.; Zhao, J. Distant supervision for relation extraction with sentence-level attention and entity descriptions. In
Proceedings of the AAAI, San Francisco, CA, USA, 4–9 February 2017; pp. 2506–2512.
33. Shang, Y.; Huang, H.; Mao, X. OneRel: Joint Entity and Relation Extraction with One Module in One Step. In Proceedings of the
60th Annual Meeting of the Association for Computational Linguistics and the 12th International Joint Conference on Natural
Language Processing, ACL/IJCNLP 2022, Dublin, Ireland, 22–27 May 2022; pp. 22–27.
34. Yuan, Y.; Zhou, X.; Pan, S.; Zhu, Q.; Song, Z.; Guo, L. A Relation-Specific Attention Network for Joint Entity and Relation
Extraction. In Proceedings of the Twenty Ninth International Joint Conference on Artificial Intelligence, Yokohama, Japan, 7–15
January 2021; pp. 4054–4060.
35. Fu, T.; Li, P.; Ma, W. Graphrel: Modeling text as relational graphs for joint entity and relation extraction. In Proceedings of the
57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 1409–1418.
36. Abiderexiti, K.; Halike, A.; Maimaiti, M.; Wumaier, A.; Yibulayin, T. Semi-automatic corpus expansion for Uyghur named entity
relation based on a hybrid method. In Proceedings of the 11th International Conference on Language Resources and Evaluation,
European Language Resources Association, Miyazaki, Japan, 7–12 May 2018; pp. 1–5.
37. Abiderexiti, K.; Maimaiti, M.; Yibulayin, T.; Wumaier, A. Annotation schemes for constructing Uyghur named entity relation
corpus. In Proceedings of the 2016 International Conference on Asian Language Processing (IALP), Tainan, Taiwan, 21–23
November 2016; pp. 103–107.
38. Aili, M.; Xialifu, A.; Maihefureti, M.; Maimaitimin, S. Building Uyghur dependency treebank: Design principles, annotation
schema and tools. In Proceedings of the Worldwide Language Service Infrastructure, WLSI 2015, Lecture Notes in Computer
Science, Kyoto, Japan, 22–23 January 2015; pp. 124–136.
39. Gong, H.; Chen, G.; Liu, S.; Yu, Y.; Li, G. Cross-modal self-attention with multi-task pre-training for medical visual question
answering. In Proceedings of the 2021 International Conference on Multimedia Retrieval, Taipei, Taiwan, 21–24 August 2021;
pp. 456–460.
40. Taghizadeh, N.; Faili, H.; Maleki, J. Cross-Language Learning for Arabic Relation Extraction. Procedia Comput. Sci. 2018, 142,
190–197. [CrossRef]
41. Hur, Y.; Son, S.; Shim, M.; Lim, J.; Lim, H. K-EPIC: Entity-Perceived Context Representation in Korean Relation Extraction. Appl.
Sci. 2021, 11, 11472. [CrossRef]
42. Maimaiti, M.; Wumaier, A.; Abiderexiti, K.; Yibulayin, T. Bidirectional long short-term memory network with a conditional
random field layer for Uyghur part-of-speech tagging. Information 2017, 8, 157. [CrossRef]
43. Parhat, S.; Ablimit, M.; Hamdulla, A. A Robust Morpheme Sequence and Convolutional Neural Network-Based Uyghur and
Kazakh Short Text Classification. Information 2019, 10, 387. [CrossRef]
44. Halike, A.; Abiderexiti, K.; Yibulayin, T. Semi-Automatic Corpus Expansion and Extraction of Uyghur-Named Entities and
Relations Based on a Hybrid Method. Information 2020, 11, 31. [CrossRef]
45. Mike, L.; Liu, Y.; Naman, G.; Marjan, G.; Abdelrahman, M.; Omer, L.; Veselin, S.; Luke, Z. Bart: Denoising sequence-to-sequence
pretraining for natural language generation, translation, and comprehension. In Proceedings of the ACL, Seattle, WA, USA, 5–10
July 2020; pp. 7871–7880.
Appl. Sci. 2023, 13, 4636 16 of 16
46. Li, X.; Arianna, Y.; Chai, D.; Zhou, M.; Li, J. Entity-relation extraction as multi-turn question answering. In Proceedings of the
ACL, Florence, Italy, 28 July–2 August 2019; pp. 1340–1350.
47. Liu, Y.; Zhu, X. Learning entity and relation embeddings for knowledge graph completion. In Proceedings of the AAAI, Austin,
TX, USA, 25–30 January 2015; pp. 2181–2187.
48. Zheng, S.; Wang, F.; Bao, H.; Hao, Y.; Zhou, P.; Xu, B. Joint extraction of entities and relations based on a novel tagging scheme. In
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver,
BC, Canada, 30 July–4 August 2017; pp. 1227–1236.
49. Tapas, N.; Hwee, T. Effective modeling of encoder-decoder architecture for joint entity and relation extraction. In Proceedings of
the AAAI, New York, NY, USA, 7–12 February 2020; pp. 8528–8535.
50. Giovanni, P.; Ben, A.; Jason, K.; Ma, J.; Alessandro, A.; Rishita, A.; Cicero, N.; Stefano, S. Structured prediction as translation
between augmented natural languages. In Proceedings of the ICLR, Virtual Conference, Formerly Addis Ababa ETHIOPIA,
Virtual, 26 April–1 May 2020; pp. 1–26.
51. Miwa, M.; Sasaki, Y. Modeling joint entity and relation extraction with table representation. In Proceedings of the 2014 Conference
on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1858–1869.
52. Devlin, J.; Chang, M.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. In
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human
Language Technologies, Mineapolis, MN, USA, 3–5 June 2019; pp. 4171–4186.
53. Jiang, T.; Zhao, T.; Qin, B.; Liu, T. The role of “condition”: A novel scientific knowledge graph representation and construction
model. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD),
Anchorage, AK, USA, 4–8 August 2019; pp. 1634–1642.
54. Hao, J.; Chen, M.; Yu, W.; Sun, Y.; Wang, W. Universal representation learning of knowledge bases by jointly embedding instances
and ontological concepts. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data
Mining (KDD), Anchorage, AK, USA, 4–8 August 2019; pp. 1709–1719.
55. Bollacker, K.; Evans, C.; Paritosh, P.; Sturge, T.; Taylor, J. Freebase: A collaboratively created graph database for structuring
human knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, Vancouver, BC,
Canada, 9–12 June 2008; pp. 1247–1250.
56. Dong, X.; Gabrilovich, E.; Heitz, G.; Horn, W.; Lao, N.; Murphy, K.; Strohmann, T.; Sun, S.; Zhang, W. Knowledge vault: A
web-scale approach to probabilistic knowledge fusion. In Proceedings of the 20th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, Doha, Qatar, 25–29 October 2014; pp. 601–610.
57. Adam, R.; Colin, R.; Noam, S. How much knowledge can you pack into the parameters of a language model. In Proceedings of
the EMNLP, Hong Kong, China, 16–20 November 2020; pp. 5418–5426.
58. Wang, J.; Lu, W. Two are better than one: Joint entity and relation extraction with table- sequence encoders. In Proceedings of the
EMNLP, Hong Kong, China, 16–20 November 2020; pp. 1706–1721.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

Applsci 13 04636 v3

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Applsci 13 04636 v3

Uploaded by

Copyright:

Available Formats

applied

Keywords: prompt; zero-shot; relationship triple extraction; Uyghur

Appl. Sci. 2023, 13, 4636. https://doi.org/10.3390/app13074636 https://www.mdpi.com/journal/applsci

Overall, our significant contributions are as follows:

3.1. Task Definition

(a) Structured Uyghur template for relation generator

Input Context: Mr. Alim died in Urumqi.

First Entity: < >, Second Entity: < >

(b) Structured Uyghur template for relation detector

Ug The Uyghur relation generator

Generate Ug, f inetune , Yu → Dsynthetic

ta. They create a pipelinereaders systemand consisting

Context: stances. Hence, D. I. Gordeev

Subject: Due to the lack

Uyghur is a multi-featured language, so in this section, we also suggest a generation

Table 4. Datasets statistics in Uyghur languages.

Sentence-Length Entities Samples Relations

4.2. Experimental Settings

4.3. Main Experimental Results

Single Triplet Multi Triplet

We have conducted comparative experiments between Zero-Shot-Prompt-RE and the

Table 6. Results for our suggested method in Uyghur Zero-Shot-Prompt-Relation Extraction.

Method Prec. Rec. F1

Table 7. Data diversity comparison.

Model Sentences Entities Words

5.2. Qualitative Analysis

Relation Genarated Sentence Real Sentence

Figure 3. Comparison of real and generated data.

5.3. Effect of Generated Data Size

Table 9. Effect of Uyghur generated data size On UyRE-ZSP.

5.4. Performance across Relations

Table 10. Separate evaluation on relation labels.

Author Contributions: Conceptualization, A.H.; methodology, A.H.; software, A.H.; validation,

You might also like