Download as pdf or txt
Download as pdf or txt
You are on page 1of 5


Description Of The Collection Task

This is a spoken language annotation project

The data must meet the following requirements:

Data 300 hours


Language Malay

II. If you encounter the following situations, you need to click the data invalid, if you
encounter uncertain data, you can send it to the project manager in advance to confirm
whether it can be marked.

1. There are many non-target languages and the spoken words are not Malay
2. Obviously reading aloud or having a strong accent
3. Don't be noisy when background music or environment appears for a long time
4. If it is a telephone communication or playback recording
5. If there is a truncated or nuisance current.

III. Data Annotation Requirements

Annotation enables us to collect qualified data. We will use the Shujiajia online
annotation platform for clipping and transcription.

1. Annotation Requirements:
(1)Validity Judgment:

1) If two people speak in overlapping voices with similar volumes in a sentence and the
overlapping part is significant, the speech should be annotated as invalid. If the
overlapping part is small (only one or two words) and the content of the main speaker
can be heard clearly, the speech should be transcribed normally.
2) If a part of a sentence cannot be heard clearly and the content cannot be determined,
the sentence should be considered invalid.

3) If there is strong noise in a sentence (environmental noise, equipment noise) that

makes it difficult to hear the main speaker's content, the sentence should be judged as

4) If a sentence has missing frames (such as "Have you had dinner?" becoming "Have
you dinner?"), it should be considered invalid.

5) If the voice is not a normal human voice (such as a machine customer service,
synthetic voice, or TV broadcast voice), the sentence should be considered invalid.

6) If a sentence contains non-native language parts, it should be considered invalid.

7) If a sentence involves sensitive information (political, religious, pornographic,

violent), it should be considered invalid.

(2 ) Effective Speech Extraction

1) Annotators need to consider semantic coherence and extract speech in sentence
units. Long sentences can be divided into phrases, and each sentence should not
exceed 8 seconds, but it should not be too short either. Based on annotation
experience, each natural language segment should be about 5-6 seconds on average.

2) The best position for each time boundary is at the lowest point of the waveform.

3) Speech from different speakers cannot be extracted in the same sentence.

4) When extracting, try to leave 0. 2-0.3 seconds of silence around the annotated speech
segment. If there is no such long silent segment, it is not necessary to force it. Try to
extract speech segments without sudden noise. To avoid sudden noise, the reserved
time before and after the speech can be shortened, but it cannot result in cut-off


5) Even if there is only one word to respond, it still needs to be extracted, and adjacent
sentences should be merged as much as possible.

6) If there is a pause in a sentence due to the speaker that lasts more than 2 seconds, it
needs to be divided into two phrases without considering the sentence meaning. If
the pause time is less than two seconds and the sentence length does not exceed 8
seconds, it can be extracted as one sentence.

7) If a person pauses in the middle of the speech for no more than two seconds, and
there is noise during the pause, resulting in incoherent or incomplete semantics
after extraction, it is not necessary to split it.

(3) Speaker Attribute Selection

1) If there are two people speaking, each person should be given a separate ID, and their
gender should be selected.

(4) Content transcription: Transcribe the content according to the audio heard, and
the transcribed content must be exactly the same as the spoken speech, without
any extra, missing, or wrong words.

1) Capitalization: If a word is usually capitalized, transcribe it according to normal

writing conventions, for example: China, Microsoft.

2) Numbers: When numbers appear in the text, do not transcribe them directly into
Arabic numerals. Instead, transcribe them into the written form of that language.

I’m 15 years old. I’m fifteen years old.

The last four digits of my phone number The last four digits of my phone number
are six, five, three and four.
are 6543.

3) Spelling words: separate the letters with a space and capitalize each letter. For

Original Article Transcription

five thirty pm five thirty P M



4) Abbreviations: When transcribing, do not use abbreviations of words. You must use
the full word spelled out phonetically. For example:

Original Article Transcription

This is Dr. Smith this is doctor Smith

5) Punctuation

a. Use punctuation marks according to grammar rules.

b . Punctuation spoken by the speaker needs to be transcribed. For example, " @ " is

transcribed as "at", ".com" is transcribed as "dot com".

c. Only commas (,), hyphens (-) (can only appear in the middle of words), periods

(.), exclamation marks (!), single quotes ('), and question marks (?) are allowed in
the transcription process. No other punctuation marks should be added, and any
added marks need to conform to the rules of grammar. All punctuation
marks should be in normal English input mode.

6) Modal particles: Modal particles should be transcribed accurately

according to their pronunciation and semantics.
7) Others

a. Dirty language should be transcribed normally, and do not use letters to replace

b. Network buzzwords and common Internet words should be transcribed

according to common usage.

c. If there are repeated words in the speech, all of them should be transcribed.

d. If the pronunciation can be determined but the semantics is uncertain, such as

common names, homophonic words can be selected to replace them, but it is

necessary to ensure that the text and pronunciation are correct. In the case of a
clear context, choose words that match the pronunciation and meaning for

e. If a word is not finished, add "-" after it and there should be a space between it

and the next word, for example: "I want to go to s- school." Note that the end of
the sentence must be a complete word. If an unfinished word is at the end
of a sentence, it should be discarded without transcription.
Acceptance Criteria:
2. The accuracy rate of word annotation is 98% or higher.

3. The format of the annotation transcription symbols must be 100% accurate .

4. Non-target language content appearing in the collection is unacceptable.

5. Repeated submission of collected data (including duplication within the

same center) will result in data invalidation and no payment.

6. Data that requires more than three annotations for correction will be released
automatically and will not be eligible for payment.

You might also like