
You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4


1. Project overview
1. Audio content:reading/meeting/dialogue,mono.

2. Accuracy requirement:90% sentence accuracy.

Check is very strict, punctuation errors are miscalculated, please be careful.

2. Invalid speech judgment

Invalid reasons unified:#NG

Invalid type:
1. Audio contains sounds that are louder than the main speaker's voice, not the speaker's voice.

2. Spoken in a non-target language.

3. Contains personal or privacy-sensitive information.

4. Parts are inaudible and input levels are inconsistent.

5. Extremely low or extremely high sound.

6. Obviously not a native speaker's voice.

7. More than 5 <unk> in the transcript.

8. The transcript contains only word fragments and <unk>.

3. Point in time interception

3.1 Intercept objects

1. The native speaker of the target language speakers/sings with standard pronunciation and
meaning; native speaker with a light accent.
2. In meetings/conversations,very low voices of other speakers should be ignored.

3.2 Length of time

1. Take no more than 60 seconds to capture each paragraph (make sure each paragraph is a
complete sentence,understood as a natural full stop).
2. Reserve at least 0.1s for mute before and after each interception,and no more than 0.5s after
each interception.
3. No more than 0.3s of silence is allowed in a sentence.

4. Note:Do not cut too broken,can merge to merge processing.

3.3 The following cases are not truncated (invalid fragments)

If the following is the case for the entire audio, the audio selection is invalid and no interception and
transcribing actions are required.
1. No Speech

a. The main speaker: contains only modal words, or self-talk and other words unrelated to the topic.

b. Electronic equipment: television broadcasting, announcement broadcasting and other electronic

devices broadcast people speak.
2. Sound Cut-off (start or end)

a. Audio truncation and cutting.

b. Note: Elongated sound at the end, truncated in the middle, does not belong to this category,
normal translation text.
3. Other languages

a. Speaking in a non-target language, such as a French project, but communicating in Japanese.

b. Proper nouns in other languages, which may be reserved as the case may be, do not fall into this
c. The speech is too short to be sure that it is the target language, which is regarded as such.

4. Unable to Transcribe

a. Listening at a moderate volume, unable to accurately transfer the speaking part of the content (or
because of noise, or because of their own lack of pronunciation).
5. Not fluent or speak incorrectly

a. Contains excessive pauses, stuttering, misspeaking, rephrasing, incomplete speech, and

excessively broken speech/words with pauses of more than 1 second in between (not between
phrases or sentences).
b. I turn the corner (long pause), and, uh, how do I (long pause) Oh, on the right, what's the name of
the store?
However, if only the particle is included at the beginning of the sentence or in the sentence, it is
normally transliterated.
c. in,fo,r,ma,tion Speaking words and phrases sound by sound like this (not speaking slowly) is also
6. Inappropriate expression

a. Sensitive information (phone number, ID number, address, etc., except celebrities) can be
translated normally.
b. Manifestly inappropriate expressions for the purpose of mischief or entertainment.

c. A speech that makes no sense in part or in whole, not in terms of unknown words, but in terms of
obvious nonsense.
7. Unknown words

a. Words you don't understand.

4. Marking rules

4.1 You write when you hear

• The content of the translation must be exactly the same as the pronuncaition heard.No more
words,fewer words,or wrong words.
• The integrity of the marked content must be consistent with the actual pronunciation,and must not
be deleted.

4.2 Numbers,or words containing numbers

• Numbers are transliterated the way the language normally is. Do not use Arabic numerals.

• Correct:one,two,three

• Error:123

4.3 Punctuation and special marks

• Punctuation,appropriately added according to the formal writing and semantics of the target
language,the end of the sentence must have a closing punctuation.(pay attention to the full half corner
format,consistent with the target language habits.)
• Note:Punctuation is added according to semantics, not pauses.
• In principle, the symbols ($ % & ¥ + - =) should not be used. Spelling out is the rule.(Correct:and
5. Labels
labels description example attention
1.Separate cases where acronyms are written and
pronounced differently. 1.[1JAL/japan airlines] (pronounced
On the platform, just select the
2.Separate transliterated words (foreign word, dialects, Japan Airlines)
[1/] ||[1|| TAB, the last two will be
neologisms, etc.) 2.[1patata/potato],[aguacate/avocado]
displayed automatically
format:[loanword notation/following the sound] 3.very [1good/wood]
3.Separate the wrong words.

Modal particle
On the platform, just select the
Note: The first letter is not capitalized when it is at the [<fil>/eh] It is too late.
[<fil>/] ||[<fil>/|| TAB, the last one will
beginning of the sentence, and spaces are normally It is [<fil>/eh] too late.
be displayed automatically
added between the words.

1. ||[<unk>]||,||/<unk>]||,||
1. Inaudible parts. Correct:It is too [<unk>]. <unk>/]||Pay attention to
2. Misspeak, change your words, etc. Error:It is too qu[<unk>]t. distinguish
Available:It is too [quiet/<unk>]. If you select the ||[|| TAB,
[<unk>] Cannot be used in words.
2.[<unk>/actual pronunciation] ||/<unk>]|| will be displayed
If [<unk>] is used at the end of the sentence as part of
Misspeaking(example):[<unk>/blay] together
the sentence, put a period; If it is a complete sentence
Correct oneself(example):I want to If you select ||/<unk>]||, ||]||
before [<unk>], then do not add a period after [<unk>].
[<unk>/go] play a game. will be displayed together

If there is no clear pause (200

Add this label for individual laughs for any time milliseconds as a guideline) or more,
[<lgh>] Laughter superimposed on speech, ignoring the laughter and it can be judged that they are
and transliterating the speech. connected, there is no need to add
multiple tags, just one. For example,
even if laughter changes like an
Add this label to isolated and obvious human noise and inhalation or exhalation sound in the
other noises. middle, it doesn't have to be [<lgh>]
It's superimposed on speech, ignoring the noise and just [<nos>], just [<lgh>].
transcribing what's said. Do not put a full stop after [<lgh>]
Format: [<ack>/ pronunciation]
A short expression in which the speaker responds to the
speech of another and encourages the speech to On the platform, just select the
Yes, yes. (This "yes" is equivalent to
[<ack>] continue. ||[<ack>/|| TAB, ||]|| will be
the answer, and there is no need to
Note: For answering other people's questions, this does displayed together
translate it like this, only the
not fall into this category.
corresponding text can be translated.)

You might also like