Professional Documents
Culture Documents
TTS Exogenous Fine Standards
TTS Exogenous Fine Standards
TTS Exogenous Fine Standards
1. Background
In order to assess the quality of the exogenous fine standard data, the supplier is required to follow
the following specifications.
2. Format requirements
1. File Format: The text file format is .txt, encoded as UTF-8, and cannot be used with Bom.
2. Content format: odd behavior sentence line, even behavior phoneme line (explained below).
Plain Text
W AY1 / AE1 M / AY1 / D UW1 . IH0 NG / EH1 V . R IY0 . TH IH2 NG / AH0 . L OW1 N
00002 If the housing boom picks- up for one last leg%, regulators will be inclined to crack
down on questionable practices%.
00003 That might have even called for- a tip/for the poor telegraph boy%, yikes%, can't let
that happen%!
DH AE1 T / M AY1 T / HH AE1 V / IY1 . V AH0 N / K AO1 L D / F AO1 R / AH0 / T IH1 P / F AO1 R /
DH AH0 / P UW1 R / T EH1 . L AH0 . G R AE2 F / B OY1 / Y AY1 K S / K AE1 N T / L EH1 T / DH
AE1 T / HH AE1 . P AH0 N
00007 Five years later%, negotiations have failed to produce a boundary accepted by both
sides%.
F AY1 V / Y IH1 R Z / L EY1 . T ER0 / N IH0 . G OW2 . SH IY0 . EY1 . SH AH0 N Z / HH AE1 V / F
EY1 L D / T UW1 / P R AH0 . D Y UW1 S / AH0 / B AW1 N . D AH0 . R IY0 / AE0 K . S EH1 P . T
IH0 D / B AY1 / B OW1 TH / S AY1 D Z
Y UW1 / W AA1 . N AH0 / S IY1 / M IY1 / S EH1 L F / D IH0 . F EH1 N D / M AY2 . S EH1 L F
I. Line of text
1. Format: " Sentence ID + TAB + Sentence Content
2. Sentence ID: Retains the IDof the original textand cannot be modified
3. Prosody annotations
a. Use "-" for read-along,"/" for prosotic phrases, and"%" for intonation phrases
b. "/" and "%" have no spaces with either the preceding and following words,and "-" has no
spaces with the preceding word and no spaces with the hindword
c. The sentence must end with "%",labeled inside the punctuation at the end of the sentence
b. Only the phonemes of the words are labeled, not punctuation; each phoneme is separated
by SPACE
c. Syllable boundaries use "." , the word borders use "/"and are separated from the front and
back symbols by SPACE
d. Ensure that the word boundaries of the phoneme line correspond to the word segmentation
information of the text line one-to-one
e. "/" or "." is no longer added after the last phoneme at the end of the sentence
i. All segments that can assume accent characteristics have one and only one accent
callout
ii. The primary accent is denoted by "1", the minor accent is denoted by "2", and the
unacmounted is denoted by "0"
iv. Each word has and only one primary accent; except for alphabetic spelling, where
"ABC" may have multiple major accents
b. tone
ii. The last phoneme of the syllable is marked with "_X", "X" indicates the tonal type (but
1,2,3...). )
3. Labeling standards
I. Text Proofreading
• Syntax: The pronunciation person automatically modifies the grammar due to a text syntax error;
for example, if the text is 'She love cat' and the actual reading is 'She loves cats', the text needs to be
changed to 'She loves cats'; if the pronunciation person does not change it, it does not need to be
changed
• Slip of the tongue: The reader pronounces a word into another; for example, 'Indus civilization'
has a bad look and accidentally reads 'indigenous civilization'and needs to change the text to
indigenous
• Add words: You need to make up the added content on the text
• Missing words: The missing content needs to be deleted accordingly on the text
• Case:
○ Words that shouldn't be capitalized are actually capitalized: sometimes the sentence will
appear 'I want to dance With you.'" Such typographical errors must be modified according to
the actual case requirements
• punctuation:
▪ Missing spaces: If there is a lack of spaces in the sentence with the function of participles
and sentences, you need to add them.
▪ Punctuation is not allowed to appear before and after spaces at the same time
○ Hyphens: Split or merge according to the sense of prosody discontinuity; for example, "T-
shirt" belongs to the same prosody word, or is labeled as a word. If the hyphen is preceded by
two prosody words, the hyphen is removed, a space is added, and the phoneme line must be
marked separately as two words.
Plain Text
Before alignment:
Pick-me-up.
After alignment:
Pick me up.
• Pitch variation: Phoneme and tone annotation do not need to reflect more regular phoneme
variants or flow variations, but need to reflect irregular pitch changes due to personal pronunciation
habits, including accent drift.
• Exogenous words: Labeled strictly according to the actual pronunciation; if there is a phoneme
outside the phoneme set of the language, for example, if a foreign source word is pronounced in
English, do not use the approximate phoneme of the language for labeling, and the problem should
be recorded to Party A.
• Consistency: Phonemes, accents, syllable divisions, and tone annotations must be consistent
before and after.
III. Prosody Annotations
• Continuous reading: Party B is requested to give a reasonable and objective continuous reading
appraisal standard for the language, and implement it after consultation with Party A.
• Prosodic phrases: Combine the comprehensive characteristics of pauses and extensions at the end
of the phrase to mark prosody phrases.
• Intonation phrase: Combines the comprehensive characteristics of pause, phrase end extension,
pitch reset and other comprehensive features to mark intonation phrases.
• Consistency: Prosody annotations must maintain consistency in the sense of hearing before and
after.