TX Guidelines-Conversational 3.2.8 (Current)

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

Transcription Guidelines 3.2.

8
(For ConversationalSpeech HitApps)

May 19, 2017


Last updated: 5/22/2017 4:08 PM

1. Summary
This document includes guidelines from the science and development teams to the transcription team,
as to how transcription is to be performed.

2. General Guidelines

Numbers
[0101] All numbers will be spelled out. E.g. “one” “two” “three”.

Special Characters
[0201] Special characters, sometimes used for short-hand notation, should not be used. The
following are examples of bad transcriptions:

• “coffee & tea”


• “i will come @ five”
• “you are #1”

Casing
[0301] Words and names must be transcribed in lowercase at all times. Note that this also
applies to the personal pronoun “I”, which should be “i”. This also applies to brand names. .
Examples:

• “i need directions to montreal”


• “maria goes to the mall in seattle”
• “do you know google”

[0302] Only acronyms and spelled-out letters should be capitalized. When a letter is spelled-
out, it must be capitalized, followed by a period, then by a space. Example:

• “directions to IBM”  “directions to I. B. M.”


• “CDs”  “C. D. s”

[0304] But acronyms that were created as initialisms, and are conventionally written in all caps, but are
pronounced like regular words. These should be transcribed like regular words; some examples are:

1
• “NAFTA”  “nafta”
• “NATO”  “nato”
• “MCAT”  “mcat”

Named Entities
[0701] Names of people, companies, songs, apps, etc. should all be transcribed in accordance
with previous rules (e.g., use lower case). Copyrighted terms (e.g., trademarks, brand names,
registered names) are also to be treated the same way.

[0702] Do not use special characters that are not typically part of. Examples:

• “Yahoo!”  “yahoo”
• “Ke$ha”  “kesha”
• “P!nk”  “pink”

In many cases, the correct spelling of named entities may be unclear. The following three items
are to help determine correct spelling, and are provided in order of priority. For example, if (1)
applies, then (2), (3),and (4) can be ignored.

[0703] Spelling (1) -- If a name, song title, or app title is present in the client recognition box (see
In-box Assistance section), then use that spelling if that’s what was said in the audio. Only
spelling information should be borrowed from the client recognition box, however. If the client
recognition box shows names in capitalized letters, for example, you’ll need to use lowercase in
transcription. Examples:

• Client recognition box contains “Jon Wayne”  transcribe as “jon wayne”


(ignoring that the famous actor’s name is John Wayne).

[0704] Spelling (2) -- Well-known people, companies, songs, apps, etc. should all be transcribed
with the trademarks or official spellings that they are known by (if it has special characters that
are not typically part of a word, see [0702] above; if it contains Arabic numbers, follow [0101]).
If you are unsure how to accurately spell one of these named entities, then perform a web search
to identify the correct spelling.

This applies to named entities that are intentionally misspelled, as well as initials in names. They
should be spelt in accordance with expected spellings. Examples:

• “Inglourious Basterds” -> “inglorious basterds”


• “Boyz n the Hood”  “boyz n the hood”
• “Pet Sematary”  “pet sematary”
• “Salt-n-Pepa”  “salt n pepa”
• “O Canada”  “O. canada”
• “Ice-T”  “ice T.”

For proper names (trademarks) with hyphen, please see [0606] below.

2
[0705] Spelling (3) -- If a non-well-known person’s name can be taken as initialisms, transcribe
them as initials if there is no In-box Assistance. Example:
• “T J is here”  “T. J. is here”

[0706] Spelling (4) -- If the correct spelling is still not clear, then make your best judgement based
on the context that you have.

Punctuation

Explicit Punctuation
[0601] If a speaker is dictating and says a punctuation mark's name, such as "exclamation point", use
standard punctuation format from Appendix A (in the future, we will enable HITApp right click insertion
from dropdown menu). If what the speaker said is not in the list provided Appendix A, transcribe it in
the following format: “SYMBOL\NAME”: the symbol itself, a backslash, and the name of the punctuation
mark transcribed in capital letters. If the punctuation mark's name is more than one word, such as
"exclamation point" or "question mark", use an underscore instead of a space to separate the
capitalized words.

The transcription should reflect exactly what was said, and there should be space at the
beginning and end of explicit punctuation. Examples:

• !\EXCLAMATION_MARK
• !\EXCLAMATION_POINT
• .\PERIOD
• “go !\EXCLAMATION_MARK james said”

But in the following situation, use words not punctuation/sign:


• “five point six”  five point six
• “go to amazon dot com”  go to amazon dot com
• “@microsoft.com”  “at Microsoft dot com”
• “#”  “hashtag”

Implicit Punctuation
[0602] If someone did not speak punctuation, the punctuation must NOT appear in the
transcription unless it is part of a normal written form of the word such as an abbreviation.

• “Apple Inc.”  “apple inc.”

Apostrophes
[0603] Apostrophes should be included as part of the transcription. If the apostrophe is
following famous person’s name which is an initial, add space then apostrophe. Examples are
as the following:

• can’t
• he’s
• john’s

3
• jessica jones's stalker (singular possessive)
• the joneses' new car (plural possessive)
• jesus' (singular possessive that is exception to the rule)
• jay Z. ’s

[0607] “Mind your P’s and Q’s”  “P. ’s and Q. ’s

Hyphens
[0604] Special attention should be paid to correct transcription of hyphens (-) for words where
the hyphen provides semantic value (such as for French numbers). In such cases, hyphens
must be included in the transcription.

• For example, quatre-vingt-neuf is the correct way of transcribing “eighty nine” in French. If this
was transcribed without hyphens (i.e., quatre vingt neuf), it could be interpreted as “four twenty
nine”.

[0605] Written-together forms like “openfaced”, which are not recognized by the MS-Word speller,
should not be used. In this case, a hyphen should be used. Examples:

• an open-faced sandwich
• a drive-by shooting
• a shut-down computer (but “a computer shutdown”)

[0606] Proper names (trademarks) with hyphen shall be transcribed with hyphens, e.g.
• “Chick-fil-a” “chick-fil-a”
• “Coca-Cola”  “coca-cola”

Math symbols
[1801] Do not use math symbols. If a person says “6 divided by 3 equals 2” “six divided by three
equals two”.

Abbreviations
[1901] Abbreviations, including common metric abbreviations must be transcribed as spoken: e.g. for
en-US:
• kilometers, centimeter, etc., not km, kms, cm, cms etc.
• “mister” instead of “mr.” or “M. R.”, etc.

[1902] But “missus” is an exception. Transcribe “missus”  “mrs” (note: no period is allowed
here; “mrs.” is wrong!).

Restarts/Repetitions
[0901] Restarts/repetitions need not be tagged if the words are not damaged; simply transcribe
the words as uttered by the speaker. Example:

• “i i just wanted to say” (since ‘i’ is undamaged and complete)

4
(For a false start, partially pronounced word or stumbled over speech in the utterance, see <FILL/> in Tag
Section).
(For a word truncated by audio, see <UNKNOWN/> in Tag Section. Note: a truncated word is defined as a
word with a portion of it cut off from the beginning or end it by the audio)

Ungrammatical words
[1001] For words which are ungrammatical in the given context, but are clearly articulated,
transcribe them as spoken. For example, if a speaker utters the plural form “bonds” in the
sample sentence below, transcribe it exactly as “bonds”.

• E.g. “find a bonds with a ten year maturity date”

Ambiguity
Language, by its very nature, can be very ambiguous – especially for very short utterances.
Ambiguity can come in many forms:

• Homophones (words with same sound but different spelling)


• E.g. clothes/close, no/know
• Variant spellings (words with more than one acceptable spelling)
• E.g., ax/axe, donut/doughnut, barbecue/barbeque
• Inflections
• E.g., in French many verb conjugation forms have the same
pronunciation, such as achète/achètes/achètent
• Proper Nouns
• E.g., John/Jon, Catherine/Katherine, Main St vs. Maine St

[1701] For homophones, variant spellings, and inflections, simply choose the form that you
feel is most likely what the speaker intended. Make sure you factor in whatever context you
have in making these decisions.

• Audio that only contains a single word, either “no” or “know” -> “no” (because this is more
likely what was intended)
• Audio contains “call John” or “call Jon” -> “call John” (if that’s the form that you feel is
more common when there is no “in-box- assistant”-Guideline [1302])
• If dictionaries offer variants for the word (e.g. “T-shirt”, “tee shirt” , “tee-shirt”, etc.), use
the default spelling (it is generally the head entry in dictionaries) and transcribe
according to guideline:
• E.g., “T-shirt”  “T. shirt”

NOTE: To avoid unfairly penalizing transcribers in this area, we will try to avoid highly-
ambiguous examples in our Gold utterances and/or develop relaxed scoring rules. If you do
encounter ambiguous Golds, please alert the Microsoft team.

For proper nouns, see the section above on Named Entities – Spelling.

5
Mispronunciations
A mispronounced word is defined as a word that we know it is what the speaker intended to say, but
certain sound(s) in the word may be said incorrectly, dropped, or have their order switches around.

[1501] For mispronunciations where the intent is clear, the standard word should be used. If
the intent is not clear, use <UNKNOWN/> This include cases like metathesis (transposition of
sounds like “misocroft” for “Microsoft”) and dialectal pronunciation which is not a word or a
word of the same meaning in the dictionary. Examples:

• “californa” (doesn’t pronounce the ‘i’)  “california”


• “misocroft” (if in context, we are sure what the speaker meant)  “Microsoft”
• In certain dialect in Italian, the word “mangia” meaning “eat” is pronounces as “mancia”.
Even though “mancia” is a word in Italian but it does not mean “eat”:
“mancia”  “mangia”

[1502] For mispronunciations where the intent is not clear, the <UNKNOWN/> tag should be
used. [see <UNKNOWN/> section in Tags]

• “directions to al-“  “directions to <UNKNOWN/>”.

[1504] If a word is only partially pronounced because the audio is cut off (either beginning or
end of audio), then the <UNKNOWN/> tag should be used. [see <UNKNOWN/> section in Tags]

Informal Words

[1601] If a speaker uses an informal pronunciation that is formally recognized as a word then
use the informal word. Examples of informal words in en-US that are recognized as words
include: wanna, gonna, kinda, sorta, betcha, doc, etc. if in doubt or just in fast speech, use the
formal word.

• “you betcha”  “you betcha” since “betcha” is in the en-US dictionaries.

[1602] Do not create new words (e.g., fulla for full of, lika for like a). If in doubt, the following
dictionaries can be used to determine if a word is officially recognized or not:

Locale Dictionary Name Dictionary URL

de-DE Duden http://www.duden.de/

en-AU Oxford Dictionary (Australian English) https://en.oxforddictionaries.com/

Macquarie Dictionary https://www.macquariedictionary.com.au/

en-CA Oxford Dictionary (Canadian English) https://en.oxforddictionaries.com/

en-IN Oxford Dictionary (Indian English) https://en.oxforddictionaries.com/

en-GB Oxford Dictionary (British English) https://en.oxforddictionaries.com/

Cambridge Dictionary (British English) http://dictionary.cambridge.org/dictionary/essential-british-english/

6
en-US Merriam-Webster Dictionary https://www.merriam-webster.com/

Dictionary.com http://www.dictionary.com/

Oxford Dictionary (US English) https://en.oxforddictionaries.com/

Cambridge Dictionary (American English) http://dictionary.cambridge.org/dictionary/essential-american-english/

es-ES Real Academia Española http://dle.rae.es/?w=diccionario

Lleva Tilde http://llevatilde.es/

es-MX Colmex http://dem.colmex.mx/

fr-CA Office québécois de la langue française https://www.oqlf.gouv.qc.ca/accueil.aspx

Bescherelle L'art de conjuguer http://bescherelle.com/

Petit Larousse http://www.larousse.fr/

fr-FR Petit Larousse http://www.larousse.fr/

hi-IN Shabdkosh www.shabdkosh.com

Oxford Dictionary (Hindi) https://hi.oxforddictionaries.com/

it-IT Treccani http://www.treccani.it/

Zanichelli https://www.zanichelli.it/

Accademia Della Cursca http://www.accademiadellacrusca.it/it/pagina-d-entrata

ja-JP Sanseido's Japanese Dictionary http://www.weblio.jp/

Goo https://dictionary.goo.ne.jp/

pt-BR Priberam https://www.priberam.pt/dlpo/

Michaelis www.michaelis.uol.com.br

zh-CN 新华字典 http://zd.diyifanwen.com/

新华字典 http://zidian.cibiao.com/

在线新华词典 (Online Xinhua Dictionary) http://xh.5156edu.com/

百度词典 (Baidu Dictionary)


http://dict.baidu.com/

百度输入法 (Baidu Input) https://shurufa.baidu.com/?pz-srf-bt

[1604] If the speaker explicitly says "ha ha ha" (which is different from legitimate laughing, which the
guidelines consider as noise), "bowwow", or "bang", etc. transcribe them if, and only if, the dictionary
considers it is a word.

Foreign Words

Borrowed Words
[1101] If the foreign word/language is part of the language’s regular lexicon (i.e., would be
understood by most speakers), write it in the foreign language using foreign language script.

For how to transcribe English words in Hindi, please see the “Language Specific Guidelines for
hi-IN.

Partial-utterance Foreign Words

7
[1102] If only part of the utterance is in a foreign language, then mark that part of the
utterance as <UNKNOWN/> but transcribe the remainder, but if the speaker is searching for a
foreign celebrity name (like a singer), or a song title, or game title, transcribe it, do not use
<UNKNOWN/>; conduct a side-search if necessary.

Full-utterance Foreign Words


[1103] If the entire utterance is in a foreign language, use the <UNKNOWN/> tag to represent
the entire utterance.

Spelling Reform
[0501] Use post-Reform spelling rules for languages with spelling reform (German, French, Portuguese),
e.g. German: write “dass” rather than “daß”.

Cortana Pronunciation
Note: we no longer have prescribed spelling for mispronunciations of Cortana. Please follow
the new guidelines below for transcription of the word.

The word ‘cortana’ requires special attention and should be treated differently than other
words. This applies to all languages (including ja-JP “コルタナ”).

[1401] If pronunciation of ‘cortana’ is correct, then it should be treated the same as other words – i.e., it
should be transcribed as ‘cortana’.

Note the following:


1. All borderline pronunciations will be considered correct (i.e., we just want to identify outliers)
2. Multiple, correct pronunciations may exist for some locales (i.e., due to regional
accents). (Some audio samples are below-only playable in the Word version of the guideline):

Locale Sample 1 Sample 2 Sample 3 Sample 4


De-DE

De-DE Hey Cortana De-DE Hey Cortana


Example 1.m4a Example 2.m4a
En-AU

en-AU_cortana.m4a en-AU_cortanna.m4a
En-CA

en-CA Hey Cortana en-CA Hey Cortana


En-GB Example 1.m4a Example 2.m4a

en-GB_cortana.m4a en-GB_cortanna.m4a

8
En-IN

en-IN_corta'na.m4a en-IN_cor'tana.m4a
En-US

en-US Hey Cortana en-US Hey Cortana


Es-ES Example 1.m4a Example 2.m4a

es-ES_Cortana_exames-ES_Cortana_exames-ES_Cortana_exam
Es-MX ple1_E0E744596D044EB7A7BFA743798D45A5.m4a
ple2_DABE5AEE0631483080706B550B437A59.m4a
ple3_303C0A558559434F9FFBCB04B27377EF.m4a

es-MX_Cortana_exames-MX_Cortana_exames-MX_Cortana_exam
Fr-CA ple1_4D61D505D7C64529BE3BFC7917B3E72B.m4a
ple2_A2C9231BE6D04C4DA1F975F6399E070F.m4a
ple3_A9CFA397392A4AE682C3FA6867C1BB9A.m4a

fr-CA_Cortana_exampfr-CA_Cortana_exampfr-CA_Cortana_exampfr-CA_Cortana_examp
Fr-FR le1_033431654B9E45C4BCF441DCE0DA5720.m4a
le1_B49E6700FFE2421BA6BE5689D1D0F3D8.m4a
le3_F8B3FB2055844951BF114319C28A1442.m4a
le4_B6B3D300181948FB8D96C79AEA2596A1

fr-FR_Cortana_exampl
fr-FR_Cortana_exampl
fr-FR_Cortana_exampl
It-IT e1_827AAAF6D96142E28268D689A7B46CA5.m4a
e2_0BE11A08A757443C9487C3FA1F1EE54B.m4a
e3_B26DE4FDABA745828FDAE0E66B7671C9.m4a

it-IT_Cortana_examplit-IT_Cortana_examplit-IT_Cortana_examplit-IT_Cortana_exampl
Ja-JP e1_95C04571EE57406D926503BB1BEF8D60.m4a
e2_D6E1F59B869747DA9ABC20794D7F9117.m4a
e3_153DEF254BB044299078D41D503D0B56.m4a
e4_05A20D529FEB4D8DA7323DE2C3CF4A12

ja-JP Kortana-san ja-JP Kortana-san


Pt-BR (Hey Cortana) Example
(Hey
1.m4a
Cortana) Example 2.m4a

pt-BR_coqtana.m4a pt-BR_cortana.m4a pt-BR_coxtana.m4a


Zh-CN

zh-CN_2444.m4a zh-CN_3334.m4a

[1402] If pronunciation of ‘cortana’ is incorrect (e.g., ‘cortina’, ‘cortona’), then it should be transcribed
as ‘cortana’ (since that was the intent), but a <MP/> tag should be added after the word ‘cortana’ to
indicate that it was mispronounced (e.g., “hey cortana<MP/> tell me a joke”). MP = MisPronounced.
This also applies to partially pronounced “cortana”, where speakers did not pronounce the full word,
and if we can clearly tell “cortana” is the intended word.

• ‘hey cortan-‘ (where the final ‘a’ is not spoken by the speaker)  ‘hey
cortana<MP/>’

9
If we cannot be sure “cortana” is the intended word, following [1405] below.

[1403] Use of <MP/> tag should only occur with mispronunciations of ‘cortana’. It should not be used
for any other word, even for words that are related to keywords (e.g., ‘hey’, ‘select’, ‘hola’ (es-ES), ‘san’
(ja-JP)).

[1404] If another word is spoken instead of ‘cortana’, then write that word instead. E.g., “hey siri”, “hey
google”, or “hey there”.

[1405] Like other words, if the audio is cutoff during ‘cortana’, it should be transcribed as
<UNKNOWN/>.

In-box Assistance

[1301] If available, you will be provided one or two text boxes containing text (see sample
screenshot below). Both versions of text are output from Microsoft speech recognizers. As a
result, it may not be reliable information. This information is provided to transcribers,
however, as an assistance. Since the provided text is not reliable, it is extremely important
that transcribers do not simply accept this text as the final transcription. The final text
submitted MUST match the audio as closely as possible. Please don’t be biased by this
information. Generally speaking, the top text box tends to be more accurate. However, the
second text box has access to a person’s personal contact list, list of apps installed on their
devices, and list of songs in their library.

[1302] If different spellings of a name, app, or song title are provided, then use the spelling
provided in the second box. For example, if the audio contains “call john”, the first text box
contains “call john”, but the second box contains “call jon”, then the correct transcription will
contain “jon” (because this is how the name was spelled in the user’s contact list). Same for
app names and song titles.

[1303] If what you hear is different than what you see in the text provided, then write what
you hear. Remember, the provided text is only provided as assistance, and is often not
correct.

10
3. Language-specific Guidelines
[2001] Russian as well as some Romance languages have grammatical categories of gender, conjugation,
case, declention. Sometimes when the speech is out of context, it is difficult to determine which specific
category a word belongs. In this case, use transcriber’s native speaker intuition and discretion.

English

O vs Oh vs Zero
• [en0101] The letter ‘o’ should be transcribed as ‘O.’ (see also “O. candana” in
Guideline [0704].
• [en0102] If used as an exclamation, ‘oh’ should be transcribed as ‘ohh’.
• [en0103] The number zero when spoken as ‘o’ should be transcribed as ‘oh’.

Zee vs. Zed


• [en0201] Both pronunciations should be transcribed as ‘Z.’.

K vs. OK vs. Okay


• [en0301] Full pronunciation should be transcribed as ‘O. K.’, not ‘okay’.

Faithfulness
• [en0401] Use yup and yeah if this is what the speaker says. [The transcription yea
(rhymes with day) is only used for the exclamation meaning woohoo]. Note: Do
not use shortened forms like gettin' or gettin to transcribe -ing words like getting

En-US vs. en-GB spelling
• [en0501] For en-US words, do not use British spellings, such as favourite and
colour, unless spelled that way as part of a proper noun (space shuttle Endeavour,
song Colour my World)

11
Japanese

Numbers

[ja0201] To differentiate cases like “72 (seven two)” and “72 (seventy-two)”, numbers should
be transcribed with Kanji characters as they are pronounced. In other words, the Kanji number
text would represent the actual pronunciation without any ambiguity.

[Examples]
“seven two” -> “Nana Ni” -> 七二
“seventy two” -> “Nana-jyu Ni” -> 七十二

Special cases of numbers:

1. Zero

[ja0202] Zero could be レイ(rei)、マル(maru)、ゼロ(zero) depending on how it’s pronounced.


“rei” and “zero” are both typically transcribed as 零. Therefore, instead of using Kanji
characters, use Katakanas as below:

[Examples]

“702” could be 七レイ二”, “七マル二” or “七ゼロ二” depending on how it’s pronounced.

2. Decimal point

[ja0203] A decimal point is transcribed as 点.

[Examples]
“7.2” is 七点二.

3. Foreign pronunciation

[ja0204] If a number is “described” in a foreign language use katakana (e.g. “Lucky Seven” is “ラ
ッキーセブン”, not “ラッキー 七”).

[Examples]

1. “Windows 10”:
If pronounced “Uindouzu Ten”, “10” should be transcribed in katakana such as “テン”.
If pronounced “Uindouzu Jyu”, then it should be “十”.
2. “Boeing 747”: If pronounced as “Boingu Sebun Fo Sebun”, then it’s “ボーイング セブ
ンフォーセブン”, instead of “ボーイング 七四七”.Kanji vs. Kana

12
Kanji vs. Hiragana

[ja0301] General rule is to use the Kanji/Hiragana expression.

[Examples]

1. Hiragana is preferable (i.e., Kanji words are too formal)

<Kanji> <Hiragana>
× 但し ○ただし
×暫くして ○しばらくして
× 改めて 〇あらためて

2. But Kanji is preferable in the following (i.e., Hiragana seems more appropriate for
children)

<Kanji> <Hiragana>
〇閉じる ×とじる
〇皮膚 ×ひふ

3. Both are acceptable

<Kanji> <Hiragana>
〇明日は 〇あすは、あしたは
〇健やか 〇すこやか

Katakana vs. English


• Transcription of English words will depend on the pronunciation:
o [ja0101] Use English if the pronunciation sounds like native English.
Spoken with a Japanese accent is fine.
o [ja0102] Use Katakana if the pronunciation sounds like Japanese (i.e., like a
word that has been borrowed into the Japanese language).
o [ja0103] When it’s not clear, the default should be katakana.

Transcribe as it is spoken
[ja0401] When the speaker says an address and uses “の” for any hyphen between
numbers, the transcriber should write “の” in hiragana and not katakana.
e.g. “文京区本郷 7-3-1”  “文京区本郷七の三の一”

13
The lengthening line “ー” (stretch mark)
[ja0501] The Japanese lengthening line “ー” should be used to transcribe a word only in two
cases:
a. The official entry of the word in the dictionary contains the lengthening line
e.g.ラーメン、サーモン
b. The word is a proper name which officially contains the lengthening line
e.g. ユーチューブ、ベートーヴェン

If the dictionary entry of the word or the proper name does not officially contain the
lengthening line “ー”, do not use the lengthening line when transcribing the word, even if the
speaker is lengthening it in the audio. Transcriptions should contain only words that are in
official dictionaries, or proper names that can be confirmed on the web.

Chinese

Numbers
• [zh0101] Numbers should be transcribed with Chinese characters not Arabic numerals.
• [zh0102] Transcribe both pronunciations “yi’ and “yao” for Chinese digit 1 as 一
• [zh0103] 7(guai) –> 七 and 0(dong) -> 零

Special Characters
• [zh0201] When someone says “plus” (“加”), it does not mean that it needs to be
entered as a plus sign (“+\加”). Unless the speaker specifies that it should be the plus
sign (e.g. by using the word 号“sign” or “punctuation”), it should be spelled it out as +\
加号.
• [zh0202] Regarding retroflex articulations, transcriptions should include “儿” if it was
articulated. e.g., 打开照片 vs 打开照片儿

Spacing
• [zh0301] Generally speaking, there should be no spaces between characters.
• [zh0302] Do not insert spaces when a speaker corrects himself. E.g., 打开打打开音

• [zh0303] Spaces should be present before/after English words
• [zh0304] Spaces should be present before/after tags, if appropriate (see guidelines
on tags).

English Brandnames
• [zh0401] Transcribe brand names in English when that is how the speaker pronounced
it. Transcribe a foreign word with Chinese characters if the speaker says it in the Chinese
way (e.g., ‘pizza’ vs ‘披萨’).

14
Pinying in an utterance
• [zh0402] In case a person say Pingying first then the word, transcribe the word only.
Example:
• “w ang 王” ”王”.

Actual word as fill


• [zh0402] If the fill in Chinese is an actual word for it such as “嗯、啊、唉”, the character
should be used instead of <FILL/>. E.g., “嗯我们就这样把”

Cantonese (zh-HK)
Traditional vs. Simplified Chinese
• [zh-HK0101] Use traditional Chinese for transcribing Cantonese.
• [zh-HK0102] When both traditional and simplified variations are acceptable, the
traditional version should be used. e.g., Taiwan should be 臺灣 not 台灣.

French
Hyphens
• [fr0101] We are standardizing on NO hyphens for cases where the number is “cent”,
“mille” or when connected with “et”. The reason is because that’s the most common
and preferred spelling in France nowadays. Examples:
o huit cent cinquante et un
o deux mille quatorze

Ouais vs Oui
• [fr0201] You may transcribe with ouais rather than oui, if this is what the speaker says.

German
Numbers
• [de0101] Cardinal numbers up to 1000 should be spelled in lower case, unless it is
unambiguously clear that they are used as substantives.
• [de0102] If, as is the case in most software names, the number is normally written with
Arabic numerals, then do it just the same as in English.

Faithfulness
• [de0201] If a speaker speaks colloquially and says “is” meaning “ist”, then transcribe as
“ist” (this is an example of Guideline [1501])
• [de0202] You may use 1st-person singular forms without the final ‘-e’, if this is what the
user says (ich geh ins kino), and conversely imperative forms with final –e (denke nicht
dran). When in doubt, use the version with the final “-e”.

15
“ß” vs. “ss”
• [de0203] For correct transcription of words with “ß” or “ss”, look it up in the Duden, or
duden.de.

Hindi
• [hi0101] Most English borrowed words in Hindi are not part of any formal Hindi
dictionary, yet most people speaking Hindi especially in urban areas would use these
words freely while speaking colloquially. Transcribe the English words with English
spelling
• [hi0102] If words from languages other than English is mixed with Hindi follow the
following principles:
• Transcribe in Hindi all known Hindi words and Entities,
• else treat it as foreign (follow guidelines from “Foreign Words” section.
Russian:
• [ru0101] Colloquially pronounced words should be transcribed according to the
standard form that appears in the dictionary if the colloquially pronounced words are
not recognized as new words, e.g.
o “чё”, “шо” should be transcribed as “что”
o “щас” should be transcribed as “сейчас”

• [ru0102] Letter “ё” must be used where needed.

• [ru0103] There are App/product names such as Skype and Viber, etc. have been fully
adapted into Russian and have Russian spelling like “Скайп” and “Вайбер”; for cased
like this, use the Russian words unless the pronunciation is clearly English, in which
cases, use the English words.

• [ru0104] Some English words is widely used in Russian and they are common written
form is in Cyrilics. Use the Russian form of the words, e.g. “окей” (Russian version of
“okay”)

• [ru0105] But for the word “email”, use the English word.

• [ru0106] Hyphen should be used for words according to the Russian grammar rules such
as: “что-то”, “по-русски”.

4. Tags

Format
• [tags0101] All tags should be in XML format (e.g., <UNKNOWN/>).
• [tags0102] Use methods provided in the UI for inserting tags (function key, right-click)
rather than spelling out the tags since spelling them out is error-prone and it is
mandatory that the tags are correctly spelled and in the correct syntax.

16
Spacing (high-level summary; for details, see each tag)
• [tags0201] For tags that associate with a word, the tag should be added after the word,
and with no space between the tag and the word. There must be a space separating the
tag from the other neighboring words. Examples:
i. “hey cortana<MP/> tell me a joke” (“cortana” is miss pronounced but it is clear
that the speaker meant “cortana” (details see [1402] and [1403])
• [tags0202] Tags that denote events by themselves (e.g., <UNKNOWN/>) should be
surrounded by spaces as if they were regular words.
• [tags0203] Sentence-level tags (e.g., CNOISE, NPS) should have no space between them
if they appear together, but there should be a space between sentence level tags and
other transcription. E.g., “<CNOISE/><NPS/> hi mom”
• For space requirements related to <NIS></NIS>, see [tags0806]
Transcription Modes

• [tags0301] Tags to be used will depend on the transcription mode requested. Please
refer to the following table to see which tags apply to the different transcription modes.

Orthographic Orthographic
only + Noise
<UNKNOWN/> X X
<FILL/> X X
<MP/> X X
<SN/> X
<CNOISE/> X

The HITApp informs you whether to use (O) or (OT) tag sets (on the top right
corner of the HITApp UX). Examples below.
Orthographic only (O) Orthographic + Noise (OT)

Active Tags

Tag Definition

17
<UNKNOWN/> Unknown

[tags0402] Use this tag when there is obvious human speech, but one or more of
the actual words cannot be determined. Difficulty in understanding may be a
result of strong accent, low volume, speech too fast, poor pronunciation, or any
other reason.

[tags0404] Use this tag when either the first word or the last word in an utterance
is cutoff (truncated) by the audio.

[tags0405] Do not use this tag when the speech is clear but ambiguous due to
various reasons, such as homophones, variant spelling, etc., for which cases,
consult respective sections: [1701] for ambiguity, [0703] for spelling, or [1301] for
in-box assistance for name spelling etc.

[tags0406] For cases where the entire utterance is unknown (either unintelligible
or in a foreign language), don’t add additional tags. Just mark the entire
utterance as <UNKNOWN/>.

[tags0407] For cases where part of the audio is intelligible and part is not, use
<UNKNOWN/> for the unintelligible portion and continue to transcribe the
intelligible portions as usual, including tags.
<NPS/> There is no longer <NPS/> for ConversationalSpeech HitApp transcriptions.
Please transcribe all speech.
<SN/> Sudden Noise
[tags0601] Any sudden or short noise which is clearly audible (at comparable
volume as the main speaker or louder). i.e., if you can hear a sudden noise, then it
should be tagged. This includes human-generated noises (e.g., cough) as well as
non-human noises. Sudden noises can be considered in two cases:
Isolated noise
[tags0602] If a sudden isolated noise is clearly audible, then it should be tagged. If
you have to strain to hear it, then tagging is not required.
e.g., “call mom” followed by a door slam would be: “call mom <SN/>”
During speech
[tags0603] If the sudden noise occurs during a word, append the tag (with no
spaces) to the end of the word.
e.g., “call mom” with a door slam during “mom” would be: “call mom<SN/>”

<FILL/> Filler
[tags0701] Use if user is producing a filled pause that is not an actual word with
semantic meaning (such as umm, er, ah, etc.).
[tags0702] If the speaker is using a real word as a filler (such as “like”), transcribe
that word. Note, speech like “uh-huh”, “uh-uh” should be treated and transcribed
as actual words.

18
[tags0703] Use if there is a false start, a partial word (not truncated) or stumbled
over speech in the utterance.
e.g., “gu- going to the store”  “<FILL/> going to the store”
(since gu- is an incomplete word)

[tags0704] Filled pauses are like words and should be separated from neighboring
words by spaces.
[tags0705] Consecutive fillers should be transcribed with a single <FILL/> tag.

Example:

“yeah john <FILL/> I think we should <FILL/> definitely meet this


weekend and <FILL/> figure something out”
<CNOISE/> Continuous Noise (sentence level tag)

[tags0901] Use this tag for any utterance that contains continuous noise (e.g.,
crying, music, singing, humming, whistling, laughter, traffic, or other white
continuous noise) throughout the recording. If there are multiple instances of
distinct noises, use <SN/> tag.

[tags0902] Place the <CNOISE/> tag at the beginning of the utterance, followed by
a space. Words from the primary speaker should be transcribed as normal.

[tags0903] If both sentence-level tags apply (i.e., <CNOISE/> and <NPS/>), then list
them in alphabetical order, with no space between them.

Examples:

“<CNOISE/> call mom”


“<CNOISE/><NPS/> call mom”
<MP/> This tag is used for transcription of ‘cortana’ only.

If pronunciation of ‘cortana’ is incorrect (e.g., ‘cortina’, ‘cortona’ or partially


pronounced “cortana”) and you are sure “cortana” is the intended word,
transcribe it as ‘cortana’ and add <MP/> tag right after the word (no space
between “cortana” and <MP/> tag, but there should be a space after <MP/> tag
and the following word if there is one.

Examples:

“hey cortana<MP/> tell me a joke”


‘hey cortan-‘ (where the final ‘a’ is not spoken by the speaker)  ‘hey
cortana<MP/>’

(for details of transcription of “cortana” see section “Cortana


Pronunciation”

19
5. Exceptions
This section outlines exceptions to the above guidelines, for specific HitApps.

Conversational Speech
• Do not use <NPS/> tag for this HitApp. All speech shall be transcribed
• Do not use <NIS></NIS> tags for this HitApp

6. Guideline Codes
General Format: [Category + GuidelineID + SubGuidelineID]

Category
• tags – for guidelines related to use of tags
• languagecode – 2-letter ISO 639-1 code depending on language (en, ja, fr…)
• none – for the core guidelines that apply to all languages

GuidelineID
• 2-digit code used to identify a single guideline or a group of guidelines

SubGuidelineID
• 2-digit code used to identify a sub-guideline

Examples
• [0401]
• [1303]
• [ja0202]
• [tags0201]

Maintenance
• As guidelines are added, new unique codes need to be created. Do not reused retired
code.

20
Appendix A Punctuation list

"en-US" and “en-CA”


o .\PERIOD
o \n\NEW_LINE
o \n\NEW_PARAGRAPH
o ,\COMMA
o ?\QUESTION_MARK
o !\EXCLAMATION_MARK
o !\EXCLAMATION_POINT
o :\COLON
o ;\SEMI_COLON
o "\QUOTE
o "\UNQUOTE
o "\QUOTATION_MARK
o “\OPEN_QUOTE
o ”\CLOSE_QUOTE
o “\OPEN_QUOTATION_MARK
o ”\CLOSE_QUOTATION_MARK

"en-GB", "en-IN", and "en-AU"


o .\PERIOD
o .\FULL_STOP
o \n\NEW_LINE
o \n\NEW_PARAGRAPH
o ,\COMMA
o ?\QUESTION_MARK
o !\EXCLAMATION_MARK
o !\EXCLAMATION_POINT
o :\COLON
o ;\SEMI_COLON
o "\QUOTE
o "\UNQUOTE
o "\QUOTATION_MARK
o “\OPEN_QUOTE
o ”\CLOSE_QUOTE
o “\OPEN_QUOTATION_MARK
o ”\CLOSE_QUOTATION_MARK
o

21
"fr-FR" and "fr-CA"
o .\POINT
o \n\NEW_LINE
o \n\NOUVELLE_LIGNE
o \n\SAUT_DE_LIGNE
o ,\VIRGULE
O ?\POINT_D'INTERROGATION
O !\POINT_D'EXCLAMATION
O :\DEUX_POINTS
O :\COLON
O ;\POINT_VIRGULE
O «\GUILLEMET_OUVRANT
O »\GUILLEMET_FERMANT
O «\GUILLEMET_GAUCHE
O »\GUILLEMET_DROIT

"it-IT"
o .\PUNTO
o \n\NEW_LINE
o \n\A_CAPO
o \n\NUOVA_RIGA
o \n\NUOVA_LINEA
o ,\VIRGOLA
O ?\PUNTO_INTERROGATIVO
O ?\PUNTO_DI_DOMANDA
O !\PUNTO_ESCLAMATIVO
O :\DUE_PUNTI
O ;\PUNTO_E_VIRGOLA
O “\VIRGOLETTE_APERTE
O ”\VIRGOLETTE_CHIUSE

"de-DE"
O .\PUNKT
O .\SATZENDE
o \n\NEUE_ZEILE
o \n\ZEILENUMBRUCH
o \n\NEW_LINE
O ,\KOMMA
O ?\FRAGEZEICHEN
O !\AUSRUFEZEICHEN
O !\RUFZEICHEN
O !\AUSRUFZEICHEN
O :\DOPPELPUNKT
O :\KOLON
O ;\STRICHPUNKT
O :\SEMIKOLON
O "\ANFÜHRUNGSZEICHEN
O „\ÖFFNENDES_ANFÜHRUNGSZEICHEN

22
O “\SCHLIEẞENDES_ANFÜHRUNGSZEICHEN
O „\ÖFFNENDES_GÄNSEFÜẞCHEN
O “\SCHLIEẞENDES_GÄNSEFÜẞCHEN

"es-ES"
O .\PUNTO
O .\PUNTO_FINAL
O \n\SALTO_DE_LÍNEA
O \n\NUEVA_LÍNEA
o \n\NEW_LINE
O ,\COMA
O ?\SIGNO_DE_INTERROGACIÓN
O ?\SIGNOS_DE_INTERROGACIÓN
O !\SIGNO_DE_EXCLAMACIÓN
O !\SIGNOS_DE_EXCLAMACIÓN
O :\DOS_PUNTOS
O ;\PUNTO_Y_COMA
O «\COMILLAS_IZQUIERDAS
O »\COMILLAS_DERECHAS

"es-MX"

O .\PUNTO
O .\PUNTO_FINAL
O \n\SALTO_DE_LÍNEA
O \n\NUEVA_LÍNEA
o \n\NEW_LINE
O ,\COMA
O ?\SIGNO_DE_INTERROGACIÓN
O ?\SIGNOS_DE_INTERROGACIÓN
O !\SIGNO_DE_EXCLAMACIÓN
O !\SIGNOS_DE_EXCLAMACIÓN
O :\DOS_PUNTOS
O ;\PUNTO_Y_COMA
O “\COMILLAS_IZQUIERDAS
O ”\COMILLAS_DERECHAS

"pt-PT"
O .\PONTO_FINAL
O \n\NOVA_LINHA
O \n\MUDAR_DE_LINHA
o \n\NOVO_PARÁGRAFO
O ,\VÍRGULA
O ?\PONTO_DE_INTERROGAÇÃO
O !\PONTO_DE_EXCLAMAÇÃO
O :\DOIS_PONTOS
O ;\PONTO_E_VÍRGULA
O “\ABRIR_ASPAS

23
O ”\FECHAR_ASPAS

"pt-BR"
o ,\VÍRGULA
o \n\НОВАЯ_СТРОКА
o \\BARRA_INVERTIDA
o \\CONTRABARRA
o /\BARRA
o :\DOIS_PONTOS
o ;\PONTO_E_VÍRGULA
o !\EXCLAMAÇÃO
o !\PONTO_DE_EXCLAMAÇÃO
o ?\INTERROGAÇÃO
o ?\PONTO_DE_INTERROGAÇÃO
o @\ARROBA

"ru-RU"
o .\ТОЧКА
o «\КАВЫЧКА
o »\КАВЫЧКА
o ?\ВОПРОСИТЕЛЬНЫЙ_ЗНАК
o !\BОСКЛИЦАТЕЛЬНЫЙ_ЗНАК
o :\ДВОЕТОЧИЕ
o ;\ТОЧКА_С_ЗАПЯТОЙ
o -\ТИПЕ
o \\КОСАЯ_ЧЕРТА
o \\ЗНАК_ДРОБИ
o /\ОБРАТНАЯ_КОСАЯ_ЧЕРТА

"ja-JP"
O 。\句点
O \n\改行
O 、\読点
O ?\疑問符
o !\感嘆符
O ・\中点
O ・\中黒

"zh-CN"
O .\句号
O \n\换行
O ,\逗号
O ?\问号
O !\感叹号
O !\惊叹号
O :\冒号

24

You might also like