Professional Documents
Culture Documents
Asdfasdasd
Asdfasdasd
{ {
arXiv:2202.11451v1 [cs.CL] 23 Feb 2022
pecially in the zero-shot cross-lingual setting. [mask] ❌ average ❌ ⼀般的 (average) [mask]
• We propose a novel label word initializa- As a cross-lingual unified prompt, if it directly uses
tion method to improve the transferability of the existing tokens from the vocabulary, it will be
prompts across languages. bias towards some specific languages, which will
harm the cross-lingual transfer due to the gap be-
• We conducted experiments in 5 languages to tween languages. To alleviate this problem, the
prove the effect of the model, and designed first goal of designing template in this task is: the
a detailed ablation experiment to analyze the template must not depend on any specific language.
role of each module. An intuitive idea to achieve this goal is to use soft
prompt, which is artificial tokens that have nothing
2 UniPrompt to do with specific languages. However, these arti-
ficial tokens: i) will not be adequately trained due
2.1 Overview
to little amount of data in few-shot scenarios; ii) do
The major differences between UniPrompt and the not appear in the pre-training stage. Therefore, the
existing prompt-based methods mainly lie in two goal of prompt, which is to activate the potential
parts: template representation and label word knowledge of PLMs, may not be achieved. Given
initialization. the problems of soft prompt, the second goal of
En De Es Fr Ja Zh
Average characters per review 178.8 207.9 151.3 159.4 101.4 51.0
Number of reviews for training/development k×5 - - - - -
Number of reviews for testing - 5,000 5,000 5,000 5,000 5,000
Table 1: Statistics of MARC data used in our paper. k is the number of training samples per class.
……
help of the multilingual PLM, the template tower
label word i
label word j can make the template easy to transfer across lan-
…… verbalizer
guages.
label word m
……
Out: Label j
2.3 Initialization of Soft Label Words
{
LM Head
{
PLM Encoder Layer p+1
Table 2: Main results. k is the number of training samples per class (i.e. k-shot).