Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

9/27/2018 hi-IN_TEST_SET

**This document is con dential, do not redistribute**

Hindi Written Domain Conventions


You will listen to some audio les and transcribe them according to the rules of your locale.

TRANSCRIPTION QUALITY PUNCTUATION FORMAT AGREED SPELLING DIFFICULT UTTERANCES

Typo Fragments versus sentences Number Spelling out Skipping a prompt


Context error Commas Currency and unit Interjections Hesitations and truncations
Added or missing words Intonation marks Date and time Proper names Background and foreground speech
Substitution Colon and quotation Address Brand and product Foreign language
Spacing Other symbols Web Media title Accents
Spoken punctuation Abbreviation Multiple spellings

https://speech.google.com/annotation/guidelines/hi_in_test_set/index.html 1/12
9/27/2018 hi-IN_TEST_SET

**This document is con dential, do not redistribute**

Transcription quality
Comply with the standard rules of the writing system.

Typo

A typo results in the unintentional creation of a non-word.

Avoid making any typographical errors. Carefully check your work before marking items as "complete".

म घर जा रहा ं ।
NOT: मे घर जा रहा ं ।

Context error

Do not correct speaker's grammar if they intentionally say something, even if what they say does not follow the standard grammatical rules of the transcription language.

मुझे भूख लगी है ।


"मुझे भूख लगी है ।"
NOT: मुझे भूख लगा है ।

मुझे भूख लगा है ।


"मुझे भूख लगा है ।"
NOT: मुझे भूख लगी है ।

Added or missing words

Do not transcribe words that are not spoken, even if they are obviously intended by the speaker. Avoid putting words in the speaker's mouth. However, do transcribe implied times and
units of currency.

₹300 इस िमठाई के िलए ब त ादा ह। "तीन सौ पय इस िमठाई के िलए ब त ादा ह।"


3:15 का अलाम लगाये। "तीन बजके पं ह िमनट का अलाम लगाये"

Transcribe all words spoken, even if they are not intended by the speaker. For interjections and non-speech vocalizations, refer to Agreed Spelling > Interjections and Dif cult Utterances
> Hesitations and Truncations.

YouTube YouTube YouTube "youtube youtube youtube"

सिचन फल फल खा रहा है ?

यह िकतनी रसभरी ू बैरीज़ ह? Speaker clearly corrected themselves after "रसभरी".

Substitution

A substitution error occurs when another standard word is transcribed instead of what was meant by the speaker. If what the speaker said falls into another category (Context Error,
Proper Name, Media Title, etc.), see the relevant section.

Spacing

Use only one space between words and sentences.

आपका नाम ा है ?
NOT: आपका नाम ा है ?

मेरा नाम रमेश है । लोग ार से मुझे सनी भी बुलात ह।


NOT: मेरा नाम रमेश है । लोग ार से मुझे सनी भी बुलात ह।

For most types of punctuation, do not put a space between the preceding word and the punctuation.

तुम ा कर रहे हो?


NOT: तुम ा कर रहे हो ?

चुप करो!
NOT: चुप करो !

नम े, यह डॉ. दीपक ह।
NOT: नम े , यह डॉ. दीपक ह ।

For quotation marks and similar punctuation, put a space before the opening punctuation, but not necessarily after the closing punctuation.

संजय ने बोला, "म तुमसे ार करता ं ।"


NOT: संजय ने बोला, " म तुमसे ार करता ं । "

https://speech.google.com/annotation/guidelines/hi_in_test_set/index.html 2/12
9/27/2018 hi-IN_TEST_SET

**This document is con dential, do not redistribute**

Punctuation
Follow the punctuation regulations of your locale. Additional conventions are outlined in this section.

Fragments versus sentences

Add punctuation where needed, but err on the side of keeping it minimal.

Full sentences should end with a punctuation mark.

म तुमसे ार करता ं ।
NOT: म तुमसे ार करता ं

In general, a complete sentence contains a subject and a verb.

वह आज घर से काम करे गा। Includes subject and verb.

मुझे तुम पसंद हो। Includes subject and verb. Sounds like a whole utterance rather than just a conjunction to a larger sentence.

Sometimes a phrase which is not obviously grammatically a sentence should nevertheless be treated as a sentence because of its context, e.g. if it's an answer to a speci c question, or if
it's an example where dropping the subject sounds completely natural as a complete sentence.

तुम िकतने साल के हो? सोलह। two speakers

िपछली बार No context to suggest this is a sentence; treat it as a fragment.

फूलों के िच Google search for images

खाने पर आ रहे हो कल? Although the subject is dropped, this still sounds completely natural and should be treated as a complete sentence.

िद ी का मौसम This is asking for information, but the most likely interpretation is as a sentence fragment on its own.

Interjections, greetings, and farewells said in isolation should be considered complete sentences and punctuated as such.

वाह! interjection

नम े। greeting

प ा। िफर िमलते ह। This includes both a yes/no word and a farewell, with a long pause between.

Below are some examples of common interjections.

ओह अरे हाहा ओ हो
हे भगवान हाय धत

Do punctuate phrases that are intended to be used by the speaker as a web search, not as full sentences.

इं िडया का िच
NOT: इं िडया का िच ।

अिमताभ ब न की िफ
NOT: अिमताभ ब न की िफ ।

Capitalize sentence fragments that sound like the beginning of a sentence. Add end punctuation to sentence fragments that sound like the end of a sentence. For fragments that do not
clearly sound like the beginning or end of a sentence, leave out capitalization and punctuation. Note that sentence fragments may be a result of cut-off audio samples.

तु ा लगता है ? ऐसा नहीं है िक Begins as complete sentence and ends mid-stream.

मु ल था। इस बात का कोई मतलब ही नहीं है । Fragment is the end of a sentence.

बोला िक इस बारे म संजय से बात मत करना। Audio was cut o at the beginning.

दु कान पर जा रहा ं । तु े पता है िक यह चाय िकतने की है ? Do not put a period, hyphen, or ellipsis, even if another sentence follows.

यह िकतने का कहां है तु ारा Both sound like beginnings of sentences.

कहां हो कहां हो तुम? Repeated beginning of the sentence.

जा रहे थे परं तु Sounds like the middle of a sentence; beginning and end were cut o .

If an utterance is not clearly a sentence according to the above rules and examples, do not capitalize or punctuate it as a sentence.

Commas

Only use commas where required. Err on the side of minimal punctuation. Do not rely on intonation.

अगला पेटोल पंप िकधर है ?


Even if the speaker uses long pauses in these places, do not use a comma. There are places where commas are allowed or required, but this example contains
NOT: अगला, पेटोल पंप, िकधर
neither.
है ??

For complete sentences that follow a single word or phrase that focuses the meaning of a sentence, put a comma after the single word or phrase.

ताज महल, बंद है ा? topic-comment

कद् दू , फल या स ी? topic-comment

Use a comma when a sentence starts with a discourse word, interjection, or yes/no word. However: If there is a long pause between a discourse word, interjection, or yes/no word and a
full sentence that follows it, treat that initial word as a separate sentence.

लेिकन, वो सच हो सकता है । Discourse word.

वाह, बेटा। Interjection.

अ ा दो , जो भी करो सावधानी से करना। Yes/no word. Other examples of these types items include "हाँ ", "अ ा" and others.

हां , मुझे ीकार है Yes/no word.

शायद, पर मुझे प ा नहीं पता। Use a comma when there is no pause, or when there is a pause that isn't long.

शायद। पर मुझे प ा नहीं पता। Use a period when there is a substantial pause after "शायद".

Use commas in lists.

मेरा बेटा भोला, छोटा, ारा और नटखट है ।

Use commas for non-restrictive modi ers, but do not use commas for restrictive modi ers. The basic test for this is whether the modi er can be dropped from the sentence and still
keep basically the same meaning.

इं िडया के धान मं ी, नर मोदी, अमे रका गए Non-restrictive modi er. "नर मोदी" does not change the core meaning of "इं िडया के धान मं ी", it just adds additional information about the Indian prime
थे। minister.

Use commas in sign-offs, such as those at the end of a message. Do not use end punctuation.

तु ारी दो , सोनाली

Do not use commas in sentences that consist only of a greeting and an addressee. If a greeting occurs at the beginning of a sentence or fragment, place a comma after the greeting. If the
greeting includes an addressee, place the comma after the addressee.

नम े।

नम े िवनोद।

नम े, म अंजिल बोल रही ं ।

नम े नेहा, म पूजा बोल रही ं ।

नम े नेहा। म पूजा बोल रही ं । Long pause between "नम े नेहा।" and "म पूजा बोल रही ं ।". Treat as separate sentences.

Except in greetings, sentence-initial and sentence- nal addressees should be separated by a comma.

https://speech.google.com/annotation/guidelines/hi_in_test_set/index.html 3/12
9/27/2018 hi-IN_TEST_SET

सिचन, मुझे कॉल कर।

तू कैसी है , मनीषा?

मनोज, नम े, िववेक बोल रहा ं ।

The phrase "Ok Google" in isolation is transcribed without a comma or end punctuation. When the phrase appears before longer utterances, place a comma after "Google".

Ok Google

Ok Google, फूलों की त ीर िदखाओ।

Ok Google, रॉक ार के गाने डाउनलोड करो।

Ok Google, इस साल िदवाली कब है ?

Intonation marks

Capitalize and punctuate the following as questions: 1) All queries syntactically built as questions, regardless of intonation. 2) All queries which sound like they are being used as
questions, regardless of sentence structure.

तीन बजे? Utterance uses rising intonation.

और ाती भी आ रही है ? Utterance uses rising intonation.

िद ी का मौसम Query uses rising intonation, but is most likely a web search rather than a true question.

If a speaker uses clearly exclamatory intonation, use an exclamation point. If there is any doubt, err on the side of using period.

चुप कर!

िबलकुल! Speaker sounds enthusiastic.

अ ा। Speaker sounds unenthused.

तू तो डरपोक है । Spoken dispassionately.

ज िदन मुबारक! Spoken with enthusiasm.

Colon and quotation

Use a comma between reported speech verbs and direct quotations. Do not put punctuation within quotation marks unless the punctuation belongs to the reported speech.

मेरा दो बोला, "मगरमछ"।


NOT: मेरा दो बोला, "मगरमछ।"
The word "बोलना" is the most common reported speech verb in English, but other words ("पूछना", "कहना") can be used for reported speech.
NOT: मेरा दो बोला "मगरमछ।"
NOT: मेरा दो बोला "मगरमछ"।

If the text in quotation marks quali es as a sentence, punctuate as if it were its own utterance. Do not alter its end punctuation even if the quote is within a sentence. Do not add excess
punctuation after end quotation marks.

नेहा बोली, "तीन बजे िमलते ह।"


The word "बोलना" is the most common reported speech verb in English, but other words ("पूछना", "कहना") can be used for reported speech.
NOT: नेहा बोली, "तीन बजे िमलते ह।"।

जितन ने पूछा, " ा हम तीन बजे िमलगे?"


NOT: जितन ने पूछा, " ा हम तीन बजे िमलगे?"।

Use a colon but no quotation marks in quotative voice actions when the quote follows the command. Use quotation marks when the quote is in the middle of the sentence.

च म "मुझे तुमसे ार है ।" कैसे कहते ह? The quote is in the middle of a sentence, so use quotation marks.

जापानी म कैसे कहते ह: मुझे पीनी है ।

example@gmail.com को ईमेल भेजो: तुम कब आओगे?

When speakers make a request for single words to be translated into another language, don't punctuate or capitalize the words, even if you'd consider the words as sentences in other
situations.

ेिनश म "नम े" का अनुवाद कर।

नम े।

Do not use quotation marks for metalinguistic uses of words or phrases. These uses include de ning the word, talking about the spelling of the word, or any other type of reference to
the word itself as a thing.

ब े ने अभी पापा बोला।


NOT: ब े ने अभी "पापा" बोला।

Other symbols

Apart from standard letters, you should not use any other symbol than: 0-9 äâàæÆçÇéèëêïîñÑôöŒœüûùμÿÄÂÀÉÈËÊÏÎÔÖÜÛÙŸ²³,?!'"_°:.()<>{}[]√/@#$€£₹+=%*&-.;

When two opposing teams are mentioned, include a hyphen between their names.

ा तुमने कल भारत-पाक का खेल दे खा?

Include a hyphen between locations in ight itineraries.

ा मुंबई-िद ी की उड़ान दो घंटे की है ?


NOT: ा मुंबई िद ी की उड़ान दो घंटे की है ?

Use hyphen in phrases and compounds typically written with hyphen. If in doubt, use hyphen.

मेरे माता-िपता बरे ली से ह।

वहां कभी-कभी जाते हो?

Spoken punctuation

For sentence-level spoken punctuation, write out the full word or words between curly brackets. Do not add punctuation symbols after spoken punctuation. Be careful with homonyms.
(See exceptions in the next rule.)

तुम कैसे हो { िच }
NOT: तुम कैसे हो?
"तुम कै से हो िच "
NOT: तुम कैसे हो िच
NOT: तुम कैसे हो िच ?

Don't spell out internal punctuation like hyphens in web pages, email addresses, addresses, phone numbers, or other word-level punctuation.

If a word that can refer to a punctuation mark is spoken in isolation, it should be written out between curly brackets.

{पूण िवराम}

{अ िवराम}

https://speech.google.com/annotation/guidelines/hi_in_test_set/index.html 4/12
9/27/2018 hi-IN_TEST_SET

**This document is con dential, do not redistribute**

Format
Transcribe numbers, abbreviations etc. following the formatting conventions in this section.

Number

Devanagari numerals should not be used, only Western Arabic numerals should be used.

Cardinals and ordinals from 0 to 9 are written with letters (except for measures and currency - see Currency and Unit). Use digits for cardinals and ordinals 10 and above, even if they are
coordinated with numbers under 10. Transcribe all decimal numbers as digits.

मेरी क ा म नौ िव ाथ ह। numbers less than 10

मेरी क ा म 47 िव ाथ ह। numbers greater than 9

When two or more numbers refer to the same noun, and one number is 10 or greater, transcribe both as numerals.

वो 9 या 10 कु े लेकर आये।

यहां पां च घोड़े और 20 बैल रहते ह।

If a large number consists of only a number followed by "हज़ार", "लाख", "करोड़", or higher, then transcribe as a numeral plus word. Otherwise, transcribe as numerals.

7 करोड़ "सात करोड़"

1 हज़ार लोग "एक हज़ार लोग"

Write lists of numbers with digits and without commas.

0 1 1 2 3 5 8 13 शू एक एक दो तीन पां च आठ तेरह

For long numbers (4+ digits) indicating quantity, insert the relevant separator (comma, decimal point, or space, depending on language).

10,000 "दस हज़ार"

In math expressions or units & measures, transcribe fraction words using numerals and slashes.

उ 1/4 िक. . की आव कता है ।


NOT: उ चौथाई िक. . की आव कता है ।
"उ चौथाई िकलो ाम की आव कता है ।"
NOT: उ ¼ िक. . की आव कता है । (bad because it includes the pre-combined fraction ¼)
NOT: उ 0.25 िक. . की आव कता है ।

3/4 िक.मी. म दाएं मुड़।


NOT: तीन चौथाई िक.मी. म दाएं मुड़। "तीन चौथाई िकलोमीटर म दाएं मुड़।"
NOT: 0.75 िक.मी. म दाएं मुड़।

5*6
NOT: पां च * छह "पां च गुना छह"
NOT: 5 गुना 6

For mixed numbers in math and units & measures, use numerals with "and".

5 फुट 3 1/2 इं च "पां च फुट साढ़े तीन इं च"

When referring to items (not units or measures), write fractions out in words. With mixed numbers, write the whole number part out in words if it is under ten, otherwise write it with
numerals.

मुझे आधी रोटी दीिजये।


NOT: मुझे 1/2 रोटी दीिजये। "मुझे आधी रोटी दीिजये।"
NOT: मुझे 0.5 रोटी दीिजये।

For mixed numbers that represent currency amounts, always use decimals.

तुम मुझे $2.50 दे सकते हो? "तुम मुझे ढाई डॉलर दे सकते हो"
शीतल ने यह घर ₹7.5 करोड़ का ख़रीदा है । "शीतल ने यह घर साढ़े सात करोड़ पये का खरीदा है ।"

Transcribe percentages using numerals and the % sign. (In the unlikely case that you encounter a number of a million or greater used as a percentage, spell it out.)

1 िमिलयन ितशत

50% खाना गायब था

If a number appears in a context which calls for a certain formatting in your language, use that formatting. Otherwise, default to the general rule for transcribing numbers.

Use Roman numerals only when part of an of cial name or title.

Super Bowl XLVII "super bowl forty seven"

King Henry VIII "king henry the eighth"

Transcribe seasons and episodes of television shows with numerals.

सीज़न 3 एिपसोड 2 "सीज़न तीन एिपसोड दो"

दे ख भाई दे ख एिपसोड 2 "दे ख भाई दे ख एिपसोड दो"

If it is a product type or statistic, use the common written form.

4x4 "चार बटा चार"

उसने 4.2 का ब ेबाजी औसत रखा। "उसने चार दशमलव दो का ब ेबाजी औसत रखा।"

Transcribe phone numbers using the most common format in the transcription language.

Transcribe phone numbers as you would write them down in their natural blocks. Do not use hyphens between blocks. When applicable, the STD code should be surounded by spaces.

+91 9897 034 241 " स नौ एक नौ आठ नौ सात शू तीन चार दो चार एक"
91 22 3988 3988 "नौ एक दो दो तीन नौ आठ आठ तीन नौ आठ आठ"

91 022 3988 3988 "नौ एक शू दो दो तीन नौ आठ आठ तीन"

Transcribe alpha-digit sequences (product codes etc.) in their most natural way (possibly several ways accepted). Do not transcribe credit card numbers, etc.

XT 660 or XT660

If it really sounds like a math expression, then transcribe it with numbers and symbols, with spaces in between.

5/6 "पां च बटा छह"

5 * 6 िकतना होता है ?
NOT: पां च बारी छह िकतना होता है ? "पां च गुना छह िकतना होता है "
NOT: 5 गुना 6 िकतना होता है ?

Currency and unit

Transcribe currencies as commonly written in the transcription language.

When a speaker uses words like "dollar" without specifying a quantity, spell them out.

बस थोड़े पए

एक भारतीय पया िकतने अमे रकी डॉलर होता है ?

एक भारतीय पया िकतने पािक ानी पए के बराबर होता है ?

तु नेपाली पया चािहए या अमे रकी डॉलर?

https://speech.google.com/annotation/guidelines/hi_in_test_set/index.html 5/12
9/27/2018 hi-IN_TEST_SET

For ranges or non-speci c currency quantities, write everything out as spoken.

मुझे चार सौ या पां च सौ पए चािहए।

एक से पां च अमरीकी डॉलर

100 से 400 पए "एक सौ से चार सौ पए"


9 से 12 पौंड "नौ से बारह पौंड"

एक या दो ऑ े िलयाई डॉलर

Abbreviate all units that follow numeric values.

मेरा प रवार 10 िक. . आलू लेकर आया है । "मेरा प रवार दस िक आलू लेकर आया है ।"

Transcribe all numeric values preceding units in numeral form, even if under 10.

वो मकान £1 करोड़ का है ।
NOT: वो मकान £1,00,00,000 का है ।

उसका वज़न 2 िक. . है ।


NOT: उसका वज़न दो िक. . है ।

म वहां 6 महीने से ं ।
NOT: म वहां छह महीने से ं ।

If it is clear from context that a number or number sequence refers to currency or time, format it as such.

गैस की कीमत $1 ित लीटर है । "गैस की कीमत एक डॉलर ित लीटर है ।"


दू ध ₹40 का है । "दू ध चालीस का है ।"

5:45 का अलाम लगाओ। "पां च बजके पतालीस िमनट का अलाम लगाओ।"

Common technical abbreviations

मेगाबाइट - MB िकलोबाइट - KB गीगाबाइट - GB टे राबाइट - TB

Slang terms (spell them out)

िकलो

Common measurements of distance and rate

इं च - इं च फुट - फुट याड - याड मील - मील


िमली मीटर - िम.मी. सटी मीटर - स.मी. मीटर - मीटर िकलो मीटर - िक.मी.
मील ित घंटा - मील ित घंटा मील ित घंटा - मील ित घंटा िकलोमीटर ित घंटा - िकमी/घंटा

Common measurements of area

वग िकलोमीटर - िकमी²

Common measurements of weight and volume

ाम - . िमली ाम - िम. . िकलो ाम - िक. .

Date and time

Use the natural form for transcribing dates.

जुलाई 12 1964 "जुलाई बारह उ ीस सौ चौंसठ"


12 जुलाई 1964 "बारह जुलाई उ ीस सौ चौंसठ"
70वीं सदी के दौरान "स रवीं सदी के दौरान"
आज तारीख है 5.10.2012 "आज तारीख है पां च दस दो हज़ार बारह"

Exception: When the date is spoken as a sequence of numbers, transcribe as such.

12/07/2010 "बारह ैश सात ैश दो हज़ार दस"


12.07.2010 को इसके खराब होने का समय है "बारह सात दो हज़ार दस को इसके खराब होने का समय है "

Use the natural form for transcribing times whenever possible.

Write times in hh:mm format whenever possible, unless it would look unnatural to do so.

3:00 "तीन बजे"

4:00 "चार बजे"

3:15 "तीन पं ह"


6:05 "छह बजके पां च िमनट"

3:15 "तीन बजके पं ह िमनट"


1:50 "दो बजने म दस िमनट"

6:45 "पौने सात"

4:15 "सवा चार"

म 12:30 प ं चूंगी "म साढ़े बारह प ं चूंगी"


"आठ बजे के आस पास"
8 बजे के आस पास
8:00 िलखना अ ाभािवक है

Address

Favor full spellings over abbreviations where natural, but use abbreviations when explicitly spoken.

Use commas for ENTITY, LOCATION.

Pizza Hut, जनक पुरी

राजेश कुमार, यमुना िवहार

होटल, पीतम पुरा

डॉ. आनंद, मुंबई

मोिहत पटे ल, आउटर रं ग रोड

मौसस, िद ी

धमशाला, राउरकेला

PK िफ का समय, िबरसा रोड म

https://speech.google.com/annotation/guidelines/hi_in_test_set/index.html 6/12
9/27/2018 hi-IN_TEST_SET

Web

Write URLs, email addresses, and Twitter hashtags as they are spoken and don't capitalize them.

www.google.co.kr "w w w dot google dot c o dot k r"

amazon.com "amazon dot com"

http://123.com "h t t p colon slash slash one two three dot com"

mike@example.org "mike at example dot org"

मुझे पराठे पसंद है । #म न "मुझे पराठे पसंद है । है शटै ग म न"


#भूख पराठे कहां है ? "है शटै ग"

Do not correct speaker errors such as transcribing a slash when the user actually says "backslash".

http://nytimes.com "h t t p colon slash slash n y times dot com"

http:\\mail.yahoo.com "h t t p colon backslash backslash mail dot yahoo dot com"

www.forbes.com "w w w forbes dot com"

If the speaker drops a "w" or dots and it's an obvious URL, you should correct these errors. If the speaker doesn't say the "w"s at all, do not add them.

"w w dot amazon dot com"


www.amazon.com
If the user mistakenly says "ww", transcribe "www".

"google dot co u k"


google.co.uk
Also transcribe the dot in an obvious URL, even if the speaker did not include it.

www.forbes.com "w w w forbes dot com"

www.facebook.com "w w facebook dot com"

If a URL is spelled out in individual letters, transcribe without spaces between individual letters.

www.target.com "w w w dot t a r g e t dot c o m"

Abbreviation

Do not abbreviate unless the speaker says an abbreviated form.

बनारस िहं दू िव िव ालय


"बनारस िहं दू िव िव ालय"
NOT: बी.एच.यू.

भारतीय जनता पाट


"भारतीय जनता पाट "
NOT: भाजपा

In acronyms, do not use periods between letters.

AT&T, MP3 brands and products

यूपी, एपी geographic names

भाजपा, इसरो, सपा pronounced as words

If a brand name uses periods, include the periods.

J. C. Penney O cial brand name as seen in the privacy policy includes periods.

https://speech.google.com/annotation/guidelines/hi_in_test_set/index.html 7/12
9/27/2018 hi-IN_TEST_SET

**This document is con dential, do not redistribute**

Agreed spelling
Spelling conventions for words where several options are thinkable, as well as proper names.

Spelling out

If a word is spelled or obvious pauses are made between letters, spell it into letters as it is said (often done for foreign names or businesses, for example). Use lowercase letters for the
spelled-out portion. This rule does not apply to acronyms or initialisms, or to spelled-out web or email addresses.

पा ल प आ र उ ल Person said "पा ल" and then spelled it.

प आ र क कहां है ? "प आ र क कहां है "

अआइईउऊएऐ spelled out alphabet

amazon.co.uk "amazon dot c o dot u k"

सीईओ "सी ई ओ"

सभी वीआईपी आगे बैठगे। spelled out "वी आई पी"

FIFA Pronounced the word as "फीफा", or spelled out "एफ आई एफ ए".

एं सीएए Speaker says "एन सी डबल ए", or "एन सी ए ए".

एएए Speaker says "िटपल ए", or "ए ए ए".

म डॉ.आनंद को जानता ं । "म डॉ र आनंद को जानता ं ।"


िम.शमा नहीं आएं गे। "िम र शमा नहीं आएं गे।"

Interjections

Transcribe words representing laughter or other non-speech vocalizations with up to three syllables, but no more.

heh, ha, haha, hahaha, hehe, hehehe, boo hoo, boo hoo hoo, lalala

हा हा हा "हा हा हा हा हा"

Ignore actual laughter that is included within speech. If the entire audio contains only laughter, use the [skip] tag in PeraPera or select the appropriate reason from the 'Cannot transcribe'
menu in Crowd Compute.

अरे ! actual laughter followed by "अरे " with clear exclamatory intonation

[skip] Entire audio contains only actual laughter, sighs, etc.

Spellings of common interjections

आह ओह अरे अजी
हे भगवान हे राम हट हाहा
बाप रे अहा

Proper names

Use of cial spelling, capitalization, and punctuation for proper names. Google them and pay attention to the correct format. Of cial format and spelling of a proper name may supercede
the usual written transcription conventions detailed in this document.

Common spellings of names

आिद अिनल अिभषेक अनीता


अंकुर अंिकत अहमद आं चल
आशा ऐ य एकल उमंग
उषा ऋिष अंगद कीित
केशव कृपा क ना का ा
खुशबू ख़ुशी ाित गौतम
ग रमा घन ाम चा चंपक
चीना िचरं जीव छिव िजत
जीिवका जागृित जयित झलक
झुम टीना डे िवड ितलक
तनय त य िदवेश िदवाकर
िदनेश िद ा दी धवल
नमन िनतेश नीतू नीना
नवीन नीरव परी ा ीित
भात पूिणमा पूनम िबमला
भावना मीरा मीनल मिहमा
महे श मीना ी रे शमा रे वती
रीता रीटा रोिहत ा
िसमरन ानेश युवराज

Brand and product

Format proper names as they are most commonly formatted on the entity's website (especially of cial documents), if available, or the Wikipedia or IMDb page. In cases of ambiguity,
defer to their privacy policy. If no other sources, use top Google hits.

वो Amazon म काम करता है ।

मने सुना है Yahoo और Nokia म समझौता हो गया है ।

म Pizza Hut और Subway म अ र खाती ं ।

YouTube

Kolkata Knight Riders की जीत ई है ।

Apple के सामने से बाएं मुड़ जाएं ।

Burger King
Do not spell "Burger King" all in upper case as in the stylized form of the logo, stick to the o cial form as per the privacy policy.
NOT: BURGER KING

LEGO
NOT: Lego

Spellings of common Brand and Product names

1D 3DS 4 Pics 1 Word 4chan


Abba Adidas Aldo Amazon
Android Market Angry Birds Babies "R" Us Barclays
BBC One Black & Decker Black Ops 2 BlackBerry
Blink-182 Burger King Casio Chanel
Chrome Citroën Claire's Coca-Cola
CrossFit DirecTV Domino's Dragon Quest IX
Droid Razr e-cigarette Earthlink easyJet
eBay eHarmony EVA Siri for Android Evernote
Facebook FIFA Flickr Formula 1
Gmail Google Google Apps Google Calendar
Google Earth Google Images Google mail Google Search

https://speech.google.com/annotation/guidelines/hi_in_test_set/index.html 8/12
9/27/2018 hi-IN_TEST_SET

Google Street View Google Toolbar GSMArena GSX-R/4


GTA V Häagen-Dazs Haribo Hawk-Eye
HobbyKing HomeShop18 Hotmail IKEA
iMac IMDb iOS iPad
iPhone iPlayer iThemes ITV Player
Je t Kellogg's Kit Kat Land Rover
LazyTown LEGO LEGOLAND LinkedIn
LOVEFILM Maroon 5 McDonald's Megabus
Mickey D's Minecraft Mini Mk4
NAPA Auto Parts Nesquick Net ix NeXT
Nice 'n Easy Nike Odeon Oral-B
Picasa PizzaExpress Plants vs. Zombies PlayStation 4
PlayStation 1 Politico PornHub Porsche
पो ऑिफस PowerPoint PS4 Ray-Ban
RealPlayer Rolls-Royce Samsung Galaxy Samsung Galaxy S II
Samsung Galaxy S III Samsung Galaxy S4 Samsung Galaxy S5 SimCity
Siri Smart car Snow+Rock SpongeBob SquarePants
Starbucks T-Mobile T.J. Maxx Texas hold 'em
TobyGames PewDiePie TomTom Tour de France
Toys "R" Us Travelodge Tumblr Twitter
Virgin Media Visa WhatsApp WrestleMania XXX
WWE '13 Xbox Xbox 360 Xbox One
Yahoo YouPorn YouTube Zagat
ZBox

The phrase "Ok Google", as well as possible derivatives such as "Ok Google Now" and "Ok Glass", require their own particular spelling of "okay". This spelling is unique to these cases.

Ok Google

Ok Google Now

Ok Google, Barista कहां है ?

Ok Google, आलू

Okay.

Okay, मोिहत।

Okay रमन, अब चलना चािहए।

Media title

Refer to the Google Play Store for of cial spellings of media titles. For lm/television, IMDb is also available. If an utterance is ambiguous between a media title and a sentence or web
search, use your judgment for which is more likely; if truly unclear, default to media title.

Write media titles as they are most commonly written. Movie titles and English book titles should be written in Devanagari.

दाग द फायर
NOT: Daag: The Fire

गेम ऑफ़ ोंस
NOT: Game of Thrones

गोदान
NOT: Godaan

है री पॉटर
NOT: Harry Potter

राजनीित
NOT: Raajneeti

Do not use quotation marks for media titles.

िकक िफ के िदखाएं ।
NOT: "िकक" िफ के िदखाएं ।

अजुन कपूर की िफ गुंडे.


NOT: अजुन कपूर की िफ "गुंडे".

Shaadi Ke Side E ects

Transcribe all media titles with original punctuation. In cases where original punctuation falls at the end of a sentence, do not transcribe sentence-level punctuation. That is, media title
punctuation trumps sentence level punctuation when in con ict. If a popular media title consists of an entire sentence but the of cial spelling is without punctuation, then don't
punctuate the title. If an utterance is ambiguous between a media title and a sentence or web search, use your judgment for which is more likely and treat it accordingly.

करार - द डील कब िनकली थी?

पुरे प रवार की मनपसंद िफ है दावत-ए-इ .

Treat foreign titles the same way as titles in the transcription language if you understand them.

Y Tu Mama Tambien

Multiple spellings

End words with speci c characters as listed below:

गए
"ए" instead of "ये"
NOT: गये

जाइए
"ए" instead of "ये"
NOT: जाइये

गई
"ई" instead of "यी"
NOT: गयी

छाई
"ई" instead of "यी"
NOT: छायी

गएं
"एं " instead of "य"
NOT: गय

आएं
"एं " instead of "य"
NOT: आय

गईं
"ईं" instead of "यीं"
NOT: गयीं

बाईं
"ईं" instead of "यीं"
NOT: बायीं

Use anuswara, ◌ं , instead of half म when the next character is any of प series consonants प, फ, ब, भ.

भूकंप
NOT: भूक

चंबल
NOT: च ल

गंभीर
NOT: ग ीर

Use anuswara, ◌ं , instead of half न or half ण when the next character is श, ष, स, or any of the क, च, ट, त, series. The full set of these characters is श, ष, स, क, ख, ग, घ, च, छ, ज, झ, ट, ठ, ड, ढ, त, थ,
द, ध.

https://speech.google.com/annotation/guidelines/hi_in_test_set/index.html 9/12
9/27/2018 hi-IN_TEST_SET

मंच
NOT: म

िहं दी
NOT: िह ी

नीलकंठ
NOT: नीलक

रघुवंश
NOT: रघुव श

There is one exception to the above two rules. When the previous character is ◌ॉ, do not use anuswara, ◌ं .

"सॉ ग"
सॉ ग
If you followed the above rules, सॉ ग will transform into सॉंग. While the character sequence in the latter is actually स, ◌ॉ, ◌ं , ग, it looks like the sequence स, ◌ा, ◌ँ , ग, which has a di erent
NOT: सॉंग
pronunciation.

सॉ र
NOT: English word "sombre"
सॉंबर

Always use anuswara ◌ं . Since chandrabindu ◌ँ and anuswara ◌ं are commonly interchanged, only use anuswara ◌ं .

लंगूर
NOT: लँगूर

हं सो
NOT: हँ सो

आं सू
NOT: आँ सू


NOT: ँ

लडिकयां
NOT: लड़िकयाँ

मु ु राएं
NOT: मु ु राएँ

If you hear a word that does not sound like a standard word of your language because there is a small sound change (i.e. accent, speech error, speech impairment, etc), transcribe the
intended word.

अिभषेक "अिभसेक"

रसगु ा "रसोगु ा"

Transcribe onomatopoeia when clearly spoken. Otherwise, use the [skip] tag in PeraPera or select the appropriate reason from the 'Cannot transcribe' menu in Crowd Compute.

िमयां उ Person says "िमयां उ".

[skip] Person mimics a cat.

If you hear a word that does not sound like a standard word of your language, but it is obviously based on real words, suf xes, or pre xes, transcribe as is.

है रािनयां even if they meant "है रानी".

If you hear a word that does not sound like a standard word of your language because it appears to be nonsense, rst perform a Google search for the word. If there is a clear candidate,
transcribe that word.

रामगड User says "रामगड". This might sound like nonsense at rst, but the transcriber guesses the spelling "रामगढ़" and is by corrected Google Search to "रामगड", a place in India. Transcribe रामगड.

भिड़या User says "भिड़या". Transcriber searches "बिढ़या", nds correct results. Transcribe भिड़या

If a word appears to be nonsense and a Google search returns no clear results but it is easy to spell and articulated clearly, transcribe it anyway.

रजनाल

If a word appears to be nonsense, a Google search returns no clear results, and the word is unintelligible or there is no single obvious spelling, mark as [skip] in PeraPera or select the
appropriate reason from the 'Cannot transcribe' menu in Crowd Compute.

"कु रा"
[skip]
or similarly unintelligible word

https://speech.google.com/annotation/guidelines/hi_in_test_set/index.html 10/12
9/27/2018 hi-IN_TEST_SET

**This document is con dential, do not redistribute**

Di cult utterances
Everything relating to problematic utterances (background noise, false starts, etc.) or di erent language varieties.

Skipping a prompt

The instructions in this section are for PeraPera. In Crowd Compute, instead of tagging as [skip] the utterances that cannot be transcribed, click in the 'Cannot transcribe' button and
select the appropriate reason.

If the prompt is dif cult to understand, listen to the audio several times to try to understand the speaker. If you can understand the speaker, transcribe the utterance. However, if after
replaying the audio you still cannot understand the speaker, skip.

Skip the utterance if it: contains at least some word(s) that cannot be understood; is in a different language typically not understood; contains no speech; contains only laughter; contains
singing; contains only synthesized speech (e.g. the voices of Google Now or Siri) and/or pre-recorded speech (e.g. TV or radio).

For utterances that contain both user-generated speech and pre-recorded or synthesized speech, transcribe user-generated speech and ignore the pre-recorded/synthesized speech.

कल का मौसम कैसा होगा? User asks, "कल का मौसम कैसा होगा?" Machine responds, "कल बा रश होगी"

If a prompt contains nonsense words, search them on the internet. If no clear results are found and the word is unintelligible (there is no single obvious spelling), [skip] it.

[skip] Speaker says, "जो भी करो सावधानी" and then says something unintelligible.

Click to copy

If the speaker sings, [skip]. Use the tag [music] if an entire utterance is music from an instrument, radio, TV, etc.

[skip] if audio contains only laughter. Ignore laughter that is interspersed with speech (transcribe only the speech).

Profanity should be fully transcribed. Under very rare circumstances, extremely offensive profanity can be skipped.

If the context of an alpha-digit sequence suggests it may be a password, credit card number, social security number, etc., then use [skip].

Hesitations and truncations

Do not transcribe false starts unless they are complete words.

अिमताभ ब न "अमी- अिमताभ ब न"


बस थोड़ा और "ब- बस थोड़ा और"

बड़ा ब त बड़ा "बड़ा- [pause] ब त बड़ा"

If a user repeats a sentence for the sake of the phone, format the repetition as a sentence if it's restating (as a sentence) what the person has said.

रॉक ार के गाने डाउनलोड करो।


रॉक ार के गाने डाउनलोड करो।

िद ी का मौसम िदखाओ। िद ीम
मौसम कैसा है ?

िकस मशीन से बगीचे को साफ करोगे? If the repeated phrase is part of the sentence that just happens to form a sentence on its own (possibly under a di erent interpretation), format it as a
बगीचे को साफ करोगे fragment. While "weed a garden" can be a command, it is ambiguous and is most likely a fragment in this context.

Complete words that have been truncated only if a very small portion of the word is missing (one syllable or less in a multisyllable word) and it is obvious what the word should be. In
cases of ambiguity, do not transcribe the cut-off word. Do not put punctuation at the end of truncated words.

"भूपित महे श भूपत"


भूपित महे श भूपित
Final sound "इ" was truncated.

"मनीष वसंत कु "


मनीष वसंत
Unclear whether they would have said "कुमार " or "कुंज".

अिमताभ ब न की िफ "िमताभ ब न की िफ "

If a truncation occurs mid-quote, use an end quotation mark even if there is possibly more intended content.

सुनील बोला, "ये काम"

Transcribe repeated words as many times as uttered, but [skip] if a phrase is repeated more than ve times.

इस घर म पां च पां च लोग रहते ह। "इस घर म पां च अअअअ पां च लोग रहते ह।"

[skip] "नम े नम े नम े नम े नम े नम े"

For numbers, stick to what is uttered, even if you know this is not all the speaker is going to say.

टावर 2 "टावर दो-"

Do not transcribe ller words unless intended by the speaker to be transcribed. Never lengthen them.

मेरी कमर आह, दु ख रही है । "मेरी कमर आआआह दु ख रही है ।"

"[sigh or loud breath]तुम ऐसा कहो।"


तुम ऐसा कहो।
Sounds like a full sentence.

He was like, "uh" "he was like uhhhh"

वो ना ब त बदमाश है ।

Background and foreground speech

Only transcribe foreground speech. A user's speech may go from the foreground to the background or vice versa (determined by change in volume) and can be accompanied by change
in speaker audience.

बगलु Speaker says loudly, "बगलु " and then quietly, "आप बस बोलो और वो ढू ं ढे गा बगलु ।"

शान होटल कहां है ? हम वहीं जा रहे है ना? The speaker changes audience but not volume, so transcribe both sentences.

If one person clearly speaks in the foreground and someone speaks in the background, transcribe the main speaker and ignore the rest.

राधा को कॉल करो। Foreground speaker said, "राधा को कॉल करो।"; background speaker said, "मीरा को कॉल करो।"

If two people take turns, without overlap, and are both in the foreground at roughly the same volume, transcribe the speech of both speakers. Separate the dialogue of different speakers
with end punctuation.

तुम कहां गए थे? म मंिदर गया था। First speaker asked "तुम कहाँ गए थे?", other person answered "म मंिदर गया था।"

पानी पूरी। पर मुझे तो आलू िट ी चािहए थी। "पानी पूरी" is a fragment, but use a period to separate the speech of di erent speakers.

If two or more people are speaking at once with no one clearly in the foreground, tag as [overlapping]. Do this for overlaps longer than one second. Use this tag even when one person is
a bit louder than the other(s) and you can tell what they're saying.

Foreign language

Do not skip utterances that contain words in English. Most of them should be transcribed using Devanagari characters. Only use Latin characters for English words if they are
measurements units, URLs, company names or tech words.

हे लो
Use Devanagari characters for common words in English
NOT: hello

mp3, jpeg
Use Latin characters for technical words
NOT: ्३, पेग्

km, MB, C, dB Use Latin characters for measurement units

YouTube. Samsung. Gmail. Use Latin characters for company names that are not normally written in Devanagari

https://speech.google.com/annotation/guidelines/hi_in_test_set/index.html 11/12
9/27/2018 hi-IN_TEST_SET

है री पॉटर Use Devanagari for media titles


NOT: Harry Potter

www.google.co.in Use Latin characters for URLs

If words in a foreign language are included in a sentence of your target language, transcribe only if commonly understood by speakers of your language. Otherwise, [skip]. Foreign words
that are commonly used (and therefore should be transcribed) can include names of foreign foods or places, pop culture phrases like "capisce", and greetings or thank yous in prominent
world languages.

चलो िडम सम खाएं ।

िप ा हम ओवन म बनाते ह।

छोटों से लेकर बड़ों तक को आइस ीम खाना पसंद होता है ।

हे लो सर, ा हाल है ? In Hindi, common English words and phrases like "hello" should be transcribed if they are included in Hindi sentences.

The following tips will help you if you're using Chrome input tools extension (https://chrome.google.com/webstore/detail/google-input-tools/mclkkofklk jcocdinagocijmpgbhab) to
generate Devanagari characters using an English keyboard. Please note that the correct characters may not be the rst choice, so please choose the correct word in accordance with
these guidelines.

How to type vowels using an English keyboard.

अ अ (a) => अनार (anaar), घर (ghar)

आ आ (aa) => आम (aam), काम (kaam), कराया (karaayaa)

इ इ (i) => इमली (imlii), प रचय (parichay), यिद (yadi)

ई ई (ii) => ईनाम (inaam), झील (jheel), गाडी (gaadi)

उ उ (u) => उषा (ushaa), कछु आ (kachhuaa), िक ु (kintu)

ऊ ऊ (uu) => ऊन (uun), खून (khuun), उ ू (ullu)

ओ ओ (o) => ओशो (osho), समोसा (samosaa), जानो (jaano)

औ औ (au) => औरत (aurat) and मुखौटा (mukhauta)

ऑ ऑ (au) => ऑन (on), सॉ ग (song) and डॉग (dog)

ए ए (e) => एक (ek), अनेक (anek), ग े (gatte)

ऐ ऐ (ae) => ऐतेहािसक (aetihaasik)

How to type consonants using an English keyboard.

क क (k) => करता (kartaa) [to do]

ख ख (kh) => खराब (kharaab) [bad]

ग ग (g) => गमला (gamlaa) [pot]

घ घ (gh) => घर (ghar) [house]

च च (ch) => च च (chammach) [spoon]

छ छ (chh) => छतरी (chatrii) [umbrella]

ज ज (ja) => (jaanwar) [animal]

झ झ (jha) => झरना (jharnaa) [stream]

ट ट (tta) => टमाटर (tamaatar) [tomato]

ठ ठ (ttha) => ठ (thand) [cold]

ड ड (dda) => डािकया (daakiya) [postman]

त त (ta) => तोता (totaa) [parrot]

थ थ (tha) => थोड़ा (thoda) [a little]

द द (da) => दीदी (didi) [sister]

ध ध (dha) => धिनया (dhaniyaa) [coriander]

प प (pa) => पागल (paagal) [mad]

फ फ (pha) => फल (phal) [fruit]

ब ब (ba) => बतन (bartan) [vessel]

भ भ (bha) => भालू (bhaalu) [bear]

म म (ma) => माता (maataa) [mother]

य य (ya) => यार (yaar) [friend]

र र (ra) => रोना (rona) [to cry]

ल ल (la) => लाल (lal) [red]

व व (va) => वीर (viir) [brave]

ह ह (ha) => है री (Harry)

How to type two of the same consonants in a row using an English keyboard.

(ka + virama + ssa) => वृ (vriksh) [tree]

ब ा bachcha

च च chammach

छ ा chatta

उ वल ujjval

How to type two different consonants in a row using an English keyboard.

िच ी chitthi

ाग tyaag

र jwar

ार pyaar

अ ा acchaa

How to type words that involve "r" using an English keyboard.

पवत parvat

मं mantr

इं indr

Accents

Correct non-standard pronunciations to their standard ones. Non-standard pronunciations could be from speakers of regional dialects, language learners, or speakers from different
countries.

मेरी भाषा एकदम बिढ़या है । "मेरी भाषा एकदम भिड़या है "


NOT: मेरी भाषा एकदम भिड़या है । Person said "भिड़या" with a "भ" sound, but it should still be spelled as standard.

बगलु
Person said "बंगलूर" but it should still be spelled as standard.
NOT: बंगलूर

https://speech.google.com/annotation/guidelines/hi_in_test_set/index.html 12/12

You might also like