Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

10/4/2017 ro-RO_TEST_SET

**This document is con dential, do not redistribute**

Romanian Written Domain


Conventions
You will listen to some audio les and transcribe them according to the rules of your locale.

TRANSCRIPTION QUALITY PUNCTUATION FORMAT

Typo Fragments versus sentences Number


Context error Commas Currency and unit
Added or missing words Intonation marks Date and time
Substitution Colon and quotation Address
Spacing Other symbols Web
Spoken punctuation Abbreviation

AGREED SPELLING DIFFICULT UTTERANCES

Spelling out Skipping a prompt


Interjections Hesitations and truncations
Proper names Background and foreground speech
Brand and product Foreign language
Media title Accents
Multiple spellings

https://speech.google.com/annotation/guidelines/ro_ro_test_set/index.html 1/20
10/4/2017 ro-RO_TEST_SET

**This document is con dential, do not redistribute**

Transcription quality
Comply with the standard rules of the writing system.

Typo

A typo results in the unintentional creation of a non-word.

Avoid making any typographical errors. Carefully check your work before marking
items as "complete".

Caută-mă pe Facebook.
NOT: Caută-mă pe Facebok.

Caută pe Google.
NOT: Caută pe Gogle.

Context error

A context error occurs when a real word is used incorrectly or when the incorrect
form of a word is used. This includes homophones and punctuation, among other
things.

S-a născut în luna mai.


NOT: Sa născut în luna m-ai.

Tânăra a roșit când a fost complimentată.


NOT: Tânăra a roșit când a fost complimente.

El ia autobuzul.
NOT: El i-a autobuzul.

Added or missing words

Do not transcribe words that are not spoken, even if they are obviously intended by
the speaker. Avoid putting words in the speaker's mouth. However, do transcribe
implied times and units of currency.

Vreau să merg să văd X-men. "vreau să merg să văd x men"


NOT: Vreau să merg să văd lmul X-men. Do not add the omitted " lmul".

"trei patruzeci și nouă e prea mult pentru


3.49 RON e prea mult pentru bomboane. bomboane"
NOT: 3 49 e prea mult pentru bomboane. Implied currency, add "RON" even
though it is not spoken.

Maria e la spital George și m-a lăsat pe


"maria e la spital george și m-a lăsat pe
mine de veghe.
mine de veghe"
NOT: Maria e la spital cu George și m-a lăsat
Do not add the omitted "cu".
pe mine de veghe.

Transcribe all words spoken, even if they are not intended by the speaker. For
interjections and non-speech vocalizations, refer to Agreed Spelling > Interjections

https://speech.google.com/annotation/guidelines/ro_ro_test_set/index.html 2/20
10/4/2017 ro-RO_TEST_SET

and Dif cult Utterances > Hesitations and Truncations.

YouTube YouTube
YouTube

Cât costă kilogramul de


mere pere? Speaker clearly corrected themselves after mistakenly
NOT: Cât costă saying "mere" instead of "pere".
kilogramul de pere?

Speaker clearly corrected themselves after mistakenly


Te-am sunat joi vineri.
saying "joi".

Substitution

Spacing

Use only one space between words and sentences.

Care este cea mai înaltă clădire din New York?


NOT: Care este cea mai înaltă clădire din_New York?

Așa cred. Hai să încercăm.


NOT: Așa cred. _Hai să încercăm.

For most types of punctuation, do not put a space between the preceding word and
the punctuation.

Vii?
NOT: Vii ?

Bună ziua, sunt Dr. Smith.


NOT: Bună ziua, sunt Dr . Smith.

https://speech.google.com/annotation/guidelines/ro_ro_test_set/index.html 3/20
10/4/2017 ro-RO_TEST_SET

**This document is con dential, do not redistribute**

Punctuation
Follow the punctuation regulations of your locale. Additional conventions are outlined in this
section.

Fragments versus sentences

Add punctuation where needed, but err on the side of keeping it minimal.

Sometimes a phrase which is not obviously grammatically a sentence should


nevertheless be treated as a sentence because of its context, e.g. if it's an answer to a
speci c question, or if it's an example where dropping the subject sounds completely
natural as a complete sentence.

Despre cine vorbești? De Two speakers. "De vecinul de lângă." is an answer to a


vecinul de lângă. speci c question.

No context to suggest this is a sentence; treat it as a


de vecinul de lângă
fragment.

Dropping the subject here sounds natural as a complete


Vii la petrecere mâine?
sentence. Punctuate as sentence.

Sounds like a web search as opposed to a dropped


spălat căței în cadă
subject. Punctuate as fragment.

Dropping the subject here sounds natural as a complete


Este foarte de treabă.
sentence. Punctuate as sentence.

Interjections, greetings, and farewells said in isolation should be considered


complete sentences and punctuated as such.

La naiba. interjection

Bună. greeting

Noroc prietenului meu cel mai bun. Entire phrase is being used as an interjection.

Capitalize sentence fragments that sound like the beginning of a sentence. Add end
punctuation to sentence fragments that sound like the end of a sentence. For
fragments that do not clearly sound like the beginning or end of a sentence, leave out
capitalization and punctuation. Note that sentence fragments may be a result of cut-
off audio samples.

Begins as complete sentence and ends mid-


Ce crezi? Nu e ca și cum
stream.

mult mai greu. Nu are sens. Fragment is the end of a sentence.

foarte greu, așa că nu te


Audio was cut o at the beginning.
descuraja.

Mă duc la cafenea. O să comand Do not put a period, hyphen, or ellipsis after a


Cât costă un cappuccino? fragment even if another sentence follows.

Cât este Unde este Both sound like beginnings of sentences.

https://speech.google.com/annotation/guidelines/ro_ro_test_set/index.html 4/20
10/4/2017 ro-RO_TEST_SET

Unde este Unde este plaja? Repeated beginning of the sentence.

Sounds like the middle of a sentence; beginning


plecau dar apoi au decis să
and end were cut o .

If an utterance is not clearly a sentence according to the above rules and examples,
do not capitalize or punctuate it as a sentence.

Commas

Only use commas where required. Err on the side of minimal punctuation. Do not
rely on intonation.

Unde este cea


mai apropiata
Even if the speaker uses long pauses in these places, do not use a
gara?
comma. There are places where commas are allowed or required,
NOT: Unde este,
but this example contains neither.
cea mai
apropiata, gara?

Use a comma when a sentence starts with a discourse word, interjection, or yes/no
word. However: If there is a long pause between a discourse word, interjection, or
yes/no word and a full sentence that follows it, treat that initial word as a separate
sentence.

Păi, credeam că Discourse word. Other examples of discourse words in Romanian


ești cu cineva. include "dar", "așa", "de fapt", and "totodată".

Interjection. Other examples of interjections include "uau", "hei",


Scuze, iubito.
"ha ha", and others.

Da, o fac. Yes/no word.

Sigur, pot să fac


Yes/no word. No pause after "Sigur".
asta.

Sigur. Pot să fac


Substantial pause after "Sigur".
asta.

Bine. Asta e
Substantial pause after "bine".
foarte plăcut.

The phrase "Ok Google" in isolation is transcribed without a comma or end


punctuation. When the phrase appears before longer utterances, place a comma
after "Google".

Ok Google

Ok Google, poze cu măslini

Ok Google, arată-mi informațiile de contact pentru Dean.

Ok Google, când este Paștele anul acesta?

Intonation marks

https://speech.google.com/annotation/guidelines/ro_ro_test_set/index.html 5/20
10/4/2017 ro-RO_TEST_SET

Capitalize and punctuate the following as questions: 1) All queries syntactically built
as questions, regardless of intonation. 2) All queries which sound like they are being
used as questions, regardless of sentence structure.

Vorbești Syntactically built as a question, so punctuate as a question regardless


serios? of intonation.

La 3:00 Rising intonation suggests it is a question, so punctuate as a question


dimineața? regardless of structure.

vremea în Query uses rising intonation, but is most likely a web search rather
Tucson than a true question.

If a speaker uses clearly exclamatory intonation, use an exclamation point. If there is


any doubt, err on the side of using period.

Super! Speaker sounds enthusiastic.

Super. Speaker sounds unenthused.

Colon and quotation

Use Romanian quotation marks "„...”".

Traduce „ce mai faci” în germană.


NOT: Traduce "ce mai faci" în germană.

Use a colon between reported speech verbs and direct quotations. When the
quotation is a full sentence, it should be capitalized.

Prietenul meu a
spus: „aligator
crocodil”.
NOT: Prietenul meu
a spus, „aligator
The word "spune" is the most common reported speech verb in
crocodile”.
Romanian, but other words ("cere", "răspunde") can be used for
NOT: Prietenul meu
reported speech.
a spus „aligator
crocodile.”
NOT: Prietenul meu
a spus „aligator
crocodile”.

Spune
„onomatopee”.
Omit the colon if the verb is in the imperative.
NOT: Spune:
„onomatopee”.

Ana a spus: „Ne


vedem după lm”.
Use a colon before the direct quotation.
NOT: Ana a spus „Ne
vedem după lm”.

When the sentence starts with the quotation, use a comma between the quotation
and reported speech verbs.

„Leu de mare”, a spus Ana.

https://speech.google.com/annotation/guidelines/ro_ro_test_set/index.html 6/20
10/4/2017 ro-RO_TEST_SET

„Ana știe ce îmi place”, se miră Use a comma between reported speech verbs and
Andrei. direct quotation.
NOT: „Ana știe ce îmi place“ se
miră Andrei.

When the quotation quali es as a sentence, question marks and exclamation marks
should be placed inside the quotation marks. Periods, on the other hand, should be
kept outside the quotes.

„Unde te duci? ”, a întrebat mama sa.

Mama sa a întrebat: „Unde te duci?”

„Ce minunat!”, a exclamat el.

El a exclamat: „Ce minunat!”

Ann a spus: „Hai să ne vedem la


3:00”. The text in quotation marks quali es as a
NOT: Ann a spus, „Hai să ne vedem la sentence. Do not add excess punctuation.
3:00.”.

Jane a întrebat: „Ne întâlnim la 3:00?”


The text in quotation marks quali es as a
NOT: Jane a întrebat, „Ne întâlnim la
sentence. Do not add excess punctuation.
3:00?”.

Josh a spus ceva de genul: „O să u


acolo. Promit” fără să își veri ce
The text in quotation marks quali es as two
calendarul.
sentences. The period ending the quotation
NOT: Josh a spus ceva de genul: „O să
should be omitted.
u acolo. Promit.” fără să își veri ce
calendarul.

Use a colon but no quotation marks in quotative voice actions when the quote
follows the command. Use quotation marks when the quote is in the middle of the
sentence.

The quote follows the command, so use a


Tradu în franceză: Ce mai faci?
colon and omit quotation marks.

Tradu „Care este numele tău?” în The quote is in the middle of a sentence, so
franceză. use quotation marks and omit the colon.

Cum spui „Te iubesc” în franceză?


Omit commas after "spui" verbs in translation
NOT: Cum spui: „Te iubesc” în
requests.
franceză?

Către example@gmail.com: Hei, cum


a fost ziua ta?

Trimite email către


example@gmail.com cu: Hei, cum a
fost ziua ta?

Other symbols

Apart from the Latin letters a through z, you should not use any other symbol than:
0-9 âàăäéèëêîșțüùÂÀĂÄÉÈËÊÎȘȚÜÙ²³,?!~^\'"„”_°:.()<>{}[]√/@#$€£+=%*&-.;

https://speech.google.com/annotation/guidelines/ro_ro_test_set/index.html 7/20
10/4/2017 ro-RO_TEST_SET

The characters s-cedilla (ş) and t-cedilla (ţ) should not be used. The authorized
characters are s-comma (ș) and t-comma (ț).

Spoken punctuation

For sentence-level spoken punctuation, write out the full word or words between
curly brackets. Do not add punctuation symbols after spoken punctuation. Be careful
with homonyms. (See exceptions in the next rule.)

Ce mai faci {semnul întrebării}


NOT: Ce mai faci?
"ce mai faci semnul întrebării"
NOT: Ce mai faci semnul întrebării
NOT: Ce mai faci semnul întrebării?

Bine {punct} {punct} {punct}


"bine punct punct punct"
NOT: Bine...

Don't spell out internal punctuation like hyphens in web pages, email addresses,
addresses, phone numbers, or other word-level punctuation.

E actriță/model.
NOT: E actriță {slash} model. "e actriță slash model"
NOT: E actriță slash model.

If a word that can refer to a punctuation mark is spoken in isolation, it should be


written out between curly brackets.

{punct}

Treat spoken punctuation as you would regular symbols, and capitalize the following
sentence as normal.

Plec acum {punct} Cât durează călătoria?


NOT: Plec acum {punct} cât durează
"plec acum punct cât durează călătoria"
călătoria?
NOT: Plec acum punct Cât durează călătoria?

Maria e la spital cu George și m-a lăsat pe


mine de veghe {punct} Dacă intervine ceva,
pot s-o aduc pe Julia la tine {semnul
întrebării}
"maria e la spital cu george și m-a lăsat
NOT: Maria e la spital cu George și m-a lăsat
pe mine de veghe punct dacă intervine
pe mine de veghe {punct} dacă intervine
ceva pot s-o aduc pe julia la tine semnul
ceva, pot s-o aduc pe Julia la tine {semnul
întrebării"
întrebării}
NOT: Maria e la spital cu George și m-a lăsat
pe mine de veghe. Dacă intervine ceva, pot
s-o aduc pe Julia la tine?

https://speech.google.com/annotation/guidelines/ro_ro_test_set/index.html 8/20
10/4/2017 ro-RO_TEST_SET

**This document is con dential, do not redistribute**

Format
Transcribe numbers, abbreviations etc. following the formatting conventions in this section.

Number

Cardinals and ordinals from 0 to 9 are written with letters (except for measures and
currency - see Currency and Unit). Use digits for cardinals and ordinals 10 and above,
even if they are coordinated with numbers under 10. Transcribe all decimal numbers
as digits.

o clasă de nouă copii numbers less than 10

o clasă de 13 copii numbers greater than 9

"trei virgulă patrusprezece"


3,14
decimal numbers

If a large number consists of only a number followed by "milion", "miliard", or higher,


then transcribe as a numeral plus word. Otherwise, transcribe as numerals.

1 milion de gâște "un milion de gâște"

1000 de gâște "o mie de gâște"

"unu virgulă cinci miliarde de dolari"


"un miliard și jumătate de dolari"
1,5 miliarde $
For mixed numbers before "milion", "miliard", etc.,
use decimals.

Pământul are 7 miliarde de


"pământul are șapte miliarde de locuitori"
locuitori.

Write lists of numbers with digits and without commas.

0 1 1 2 3 5 8 13 "zero unu unu doi trei cinci opt treisprezece"

"trei doi unu lansare"


3 2 1 lansare
list of numbers, no comma if just counting

In math expressions or units & measures, transcribe fraction words using numerals
and slashes.

Au nevoie de
1/4 de kg de
zahăr.
"au nevoie de un sfert de kilogram de zahăr"
NOT: Au nevoie
Here, the "un" before "sfert" is part of the fraction, so don't include
de ¼ de kg de
it in the transcription. Also, be careful not to include spaces or pre-
zahăr.
combined fraction characters.
NOT: Au nevoie
de 1 / 4 de kg
de zahăr.

În 3/4 de milă, "în trei sferturi de milă fă dreapta"


fă dreapta. If spoken, include "de" after the fraction.

https://speech.google.com/annotation/guidelines/ro_ro_test_set/index.html 9/20
10/4/2017 ro-RO_TEST_SET

NOT: În trei
sferturi de milă,
fă dreapta.

"în două treimi kilometru fă stînga"


În 2/3 km, fă
If the speaker does not use "de" after the fraction, leave it out of the
stânga.
transcription.

For mixed numbers that represent currency amounts, always use decimals.

Poți să îmi împrumuți 2,50 $? "poți să îmi împrumuți doi dolari jumate"

A cumpărat casa de pe plajă pentru "a cumpărat casa de pe plajă pentru șapte
7,5 miliarde $. milioane jumate de dolari"

9,50 RON "nouă lei cincizeci"

Transcribe percentages using numerals and the % sign. (In the unlikely case that you
encounter a number of a million or greater used as a percentage, spell it out.)

lapte cu 2% grăsime "lapte cu doi la sută grăsime"

1 milion la sută

If a number appears in a context which calls for a certain formatting in your


language, use that formatting. Otherwise, default to the general rule for transcribing
numbers.

Transcribe phone numbers using the most common format in the transcription
language.

"plus patruzeci doi unu trei trei cinci patru patru cinci cinci"
+40 21 335 44 55
landline with country code (the leading "0" is removed)

"zero doi unu trei doi unu unu unu unu unu"
021 321 11 11 landline with two-digit area code preceded by the leading
"0"

021 888 22 21 interior


"zero doi unu opt opt opt doi doi doi unu interior douăzeci"
20

landline with three-digit area code preceded by the leading


0268 335 335
"0"

0749 123 123 mobile phone number

0800 123 456 toll-free number

0901 123 456 premium-rate services

If it really sounds like a math expression, then transcribe it with numbers and
symbols, with spaces in between.

5/6^3 "cinci împărțit la șase la puterea a treia"

Cât e 5 * 6? "cât e cinci ori șase"

√3 "radical din treia"

https://speech.google.com/annotation/guidelines/ro_ro_test_set/index.html 10/20
10/4/2017 ro-RO_TEST_SET

Cât înseamnă 8 ore * 12 $? "cât înseamnă opt ore ori doispreze dolari"

Cât înseamnă trei crocodili împarțiți la Does not sound like a true math expression
două iguane? with useful units.

Currency and unit

Transcribe currencies as commonly written in the transcription language.

5 RON "cinci lei"

10 $ "zece dolari"

Cât înseamnă 20 € în dolari? "cât înseamna douăzeci euro în dolari"

30 bani "treizeci de bani"

For all other currencies and slang terms for money, spell out the words.

Am cheltuit cinci miare.


"am cheltuit cinci miare"
NOT: Am cheltuit 5000 $.

200 yeni
"două sute de yeni"
NOT: 200 ¥

For degrees, use the ° symbol.

Sunt 20° afară.

Sunt minus cinci în Milwaukee.


"sunt minus cinci în milwaukee"
NOT: Sunt -5 în Milwaukee.

Abbreviate all units that follow numeric values.

Familia mea a cumpărat 10 L de suc de "familia mea a cumpărat zece litri de suc de
portocale. portocale"

măsoară 5 kg "măsoară cinci kilograme"

If it is clear from context that a number or number sequence refers to currency or


time, format it as such.

Pune alarma la 5:45.


"pune alarma la cinci patruzeci și cinci"
NOT: Pune alarma la 545.

Date and time

Use the natural form for transcribing dates.

12 iulie 1964 "douăsprezece iulie o mie nouă sute șaizeci și patru"

toamna din '78 "toamna din șaptezeci și opt"

muzica anilor '80 "muzica anilor optzeci"

miercuri, 6 martie "mierucuri șase martie"

https://speech.google.com/annotation/guidelines/ro_ro_test_set/index.html 11/20
10/4/2017 ro-RO_TEST_SET

Use the natural form for transcribing times whenever possible.

Write times in hh:mm format whenever possible, unless it would look unnatural to
do so.

3:00 "ora trei"

Pune alarma la 4:00. "pune alarma la ora patru"

1:50 "două fără zece"

3:15 "trei și cincisprezece"

ora 16:00 "ora șaisprezece"

Address

Favor full spellings over abbreviations where natural, but use abbreviations when
explicitly spoken.

Use commas for ENTITY, LOCATION.

Café Verona, București

Bioclinica, Calea Călărașilor

Aleea Alexandru 31, sector 1, București, 011822

Craigslist, Detroit

Web

Write URLs, email addresses, and Twitter hashtags as they are spoken and don't
capitalize them.

amazon.com "amazon punct com"

http://123.com "h t t p două puncte bară bară un doi trei punct com"

mike@example.org "mike a rond example punct org"

Iubesc pizza. #înfometat "iubesc pizza haștag înfometat"

If the speaker drops a "w" or dots and it's an obvious URL, you should correct these
errors. If the speaker doesn't say the "w"s at all, do not add them.

"w w punct amazon punct com"


www.amazon.com
If the user mistakenly says "ww", transcribe "www".

"w w w un doi trei punct com"


www.123.com Also transcribe the dot in an obvious URL, even if the speaker did
not include it.

Abbreviation

Do not abbreviate unless the speaker says an abbreviated form.

https://speech.google.com/annotation/guidelines/ro_ro_test_set/index.html 12/20
10/4/2017 ro-RO_TEST_SET

Pandurii Tg-jiu "pandurii t g jiu"


NOT: Pandurii Târgu-Jiu

Steaua versus Dinamo


"steaua versus dinamo"
NOT: Steaua vs. Dinamo

Capitalize and abbreviate titles for people only when they precede proper names.

Îl cunosc pe Dr. Popescu.

Doctorul meu îmi spune să fac mai multă mișcare.

A venit și Carl Rove Jr.

Joacă la juniori.

L-am văzut pe Președintele Obama la TV azi.

L-am văzut pe președinte la TV azi.

In acronyms, do not use periods between letters.

MP3, SUA, ONU

A văzut un OZN.

https://speech.google.com/annotation/guidelines/ro_ro_test_set/index.html 13/20
10/4/2017 ro-RO_TEST_SET

**This document is con dential, do not redistribute**

Agreed spelling
Spelling conventions for words where several options are thinkable, as well as proper names.

Spelling out

If a word is spelled or obvious pauses are made between letters, spell it into letters as
it is said (often done for foreign names or businesses, for example). Use lowercase
letters for the spelled-out portion. This rule does not apply to acronyms or
initialisms, or to spelled-out web or email addresses.

Ivanov i v a n o v Person said "ivanov" and then spelled it.

VIPuri spelled out "v i p" with plural "-uri"

Interjections

Transcribe words representing laughter or other non-speech vocalizations with up to


three syllables, but no more.

he, ha, haha, hahaha

hahaha "ha ha ha ha ha"


NOT: hahahahaha Do not transcribe more than three syllables.

Ignore actual laughter that is included within speech. If the entire audio contains only
laughter, use the [skip] tag in PeraPera or select the appropriate reason from the
'Cannot transcribe' menu in Crowd Compute.

Știu! actual laughter followed by "Știu" with clear exclamatory intonation

[skip] Entire audio contains only actual laughter, sighs, etc.

Proper names

Use of cial spelling, capitalization, and punctuation for proper names. Google them
and pay attention to the correct format. Of cial format and spelling of a proper name
may supercede the usual written transcription conventions detailed in this
document.

Defer to of cial spellings of celebrity names.

Vasile Alecsandri
NOT: Vasile Romanian poet
Alexandri

The 5.6.7.8's Spelled this way in privacy policy.

will.i.am

Kristin The celebrity spells her name di erently than the more common
Chenoweth "Kristen".

https://speech.google.com/annotation/guidelines/ro_ro_test_set/index.html 14/20
10/4/2017 ro-RO_TEST_SET

If a personal name could have multiple spellings and context does not help choose a
spelling, use the spelling that yields the most Google search hits when you search for
the name followed by the word "name" (without quotation marks) (e.g. "Anna name").

McDonald Searching "McDonald name" yields more search results than


NOT: "MacDonald name". Note that this example refers to the surname, not
MacDonald the restaurant chain.

If a speaker makes a small mistake in a proper name, capitalize it anyway as long as


the difference is minimal. "Minimal differences" refers to adding or dropping articles,
possessives, and plurals.

The Lord of the Ring for "The Lord of the Rings"

Matrix for "The Matrix"

craiglist "craig list"; actual name "craigslist"

Milionar de weekend The actual name is "Milionari de weekend".

Brand and product

Format proper names as they are most commonly formatted on the entity's website
(especially of cial documents), if available, or the Wikipedia or IMDb page. In cases of
ambiguity, defer to their privacy policy. If no other sources, use top Google hits.

Lucrează la Amazon.

Am auzit că Yahoo și T-Mobile au ajuns la o înțelegere.

Toys "R" Us

YouTube

TAROM

IBM

PayPal

The phrase "Ok Google", as well as possible derivatives such as "Ok Google Now" and
"Ok Glass", require their own particular spelling of "okay". This spelling is unique to
these cases.

Ok Google

Ok Google Now

Ok Google, unde este Starbucks?

Ok Google, dovleci

Spellings of common Brand and Product names

1D 3DS 4chan Abba


Adidas Aldo Amazon Android Market

https://speech.google.com/annotation/guidelines/ro_ro_test_set/index.html 15/20
10/4/2017 ro-RO_TEST_SET

Angry Birds Babies "R" Us Barclays BBC One


Black & Decker Black Ops 2 BlackBerry Blink-182
Burger King Casio Chanel Chrome
Citroën Claire's Coca-Cola CrossFit
DirecTV Domino's Dragon Quest IX Droid Razr
e-cigarette Earthlink easyJet eBay
eHarmony Evernote Facebook FIFA
Flickr Formula 1 Gmail Google
Google Apps Google Calendar Google Earth Google Images
Google mail Google Search Google Street View Google Toolbar
GSMArena GSX-R/4 GTA V Häagen-Dazs
Haribo Hawk-Eye HobbyKing HomeShop18
Hotmail IKEA iMac IMDb
iOS iPad iPhone iPlayer
iThemes ITV Player Je t Kellogg's
Kit Kat Land Rover LazyTown LEGO
LEGOLAND LinkedIn LOVEFILM Maroon 5
McDonald's Megabus Mickey D's Minecraft
Mini Mk4 NAPA Auto Parts Nesquick
Net ix NeXT Nike Odeon
Oral-B Picasa Plants vs. Zombies PlayStation 4
PlayStation 1 Politico PornHub Porsche
post o ce (unless PowerPoint PS4 Ray-Ban
referring to a
speci c place, i.e.
Ann Arbor Post
O ce)
RealPlayer Rolls-Royce Samsung Galaxy Samsung Galaxy S II
Samsung Galaxy S III Samsung Galaxy S4 Samsung Galaxy S5 SimCity
Siri Smart car SpongeBob Starbucks
SquarePants
T-Mobile T.J. Maxx Texas hold 'em TobyGames
PewDiePie TomTom Tour de France Toys "R" Us
Travelodge Tumblr Twitter Virgin Media
Visa WhatsApp WrestleMania XXX WWE '13
Xbox Xbox 360 Xbox One Yahoo
YouPorn YouTube Zagat ZBox

Media title

Refer to the Google Play Store for of cial spellings of media titles. For lm/television,
IMDb is also available. If an utterance is ambiguous between a media title and a
sentence or web search, use your judgment for which is more likely; if truly unclear,
default to media title.

Do not use quotation marks for media titles.

https://speech.google.com/annotation/guidelines/ro_ro_test_set/index.html 16/20
10/4/2017 ro-RO_TEST_SET

screenshots pentru Call of Duty: Black Ops 2

Pune Diamonds de Rihanna.

Los Angeles Times

Multiple spellings

When multiple spellings are attested, use the rst spelling used in the reference
dictionary for your language. If there is no entry, Google the word and use the form
with the most hits.

Ciocnim ouă de Paște.


"Paște" is preferred by DEX.
NOT: Ciocnim ouă de Paști.

Transcribe onomatopoeia when clearly spoken. Otherwise, use the [skip] tag in
PeraPera or select the appropriate reason from the 'Cannot transcribe' menu in
Crowd Compute.

miau Person says "miau".

[skip] Person mimics a cat.

Transcribe slang and colloquialisms as spoken according to the appendix on this


page. Do not alter non-standard speech that the speaker probably wouldn't want
corrected.

Sunt și eu p-acilea.
"sunt și eu p-acilea"
NOT: Sunt și eu pe aici.

Mă iubește femeile. "mă iubește femeile"


NOT: Mă iubesc femeile. common slang expression

Write commonly accepted contractions as usual. Transcribe contractions when you


hear them spoken.

Mergem s-o vedem Speaker said "s-o". Transcribe "să o" only when the speaker
pe bunica. actually says two distinct words.

Speaker said "nu-mi". Transcribe "nu îmi" only when the speaker
Nu-mi pasă.
actually says two distinct words.

Use standard spelling for reductions that commonly occur in normal running speech,
like "lui" for "lu'" or "l-elisions" in neutral and masculine nouns endings.

casa lui George "casa lu' george"

Am pierdut trenul.
"am pierdut trenu"
NOT: Am pierdut trenu'.

If you hear a word that does not sound like a standard word of your language, but it is
obviously based on real words, suf xes, or pre xes, transcribe as is.

interpretiza Speaker meant "interpreta" but added a su x.

https://speech.google.com/annotation/guidelines/ro_ro_test_set/index.html 17/20
10/4/2017 ro-RO_TEST_SET

**This document is con dential, do not redistribute**

Di cult utterances
Everything relating to problematic utterances (background noise, false starts, etc.) or di erent
language varieties.

Skipping a prompt

The instructions in this section are for PeraPera. In Crowd Compute, instead of
tagging as [skip] the utterances that cannot be transcribed, click in the 'Cannot
transcribe' button and select the appropriate reason.

If the prompt cannot be understood, skip it (tag it as [skip]). It is preferable to skip


rather than mistranscribe.

Skip the utterance if it: contains at least some word(s) that cannot be understood; is in
a different language typically not understood; contains no speech; contains only
laughter; contains singing; contains only synthesized speech (e.g. the voices of Google
Now or Siri) and/or pre-recorded speech (e.g. TV or radio).

For utterances that contain both user-generated speech and pre-recorded or


synthesized speech, transcribe user-generated speech and ignore the pre-
recorded/synthesized speech.

Cum e vremea User asks, "Cum e vremea în Oakland?" Machine responds, "Sunt 70
în Oakland? de grade Fahrenheit și este soare în Oakland."

If the speaker sings, [skip]. Use the tag [music] if an entire utterance is music from an
instrument, radio, TV, etc.

[skip] if audio contains only laughter. Ignore laughter that is interspersed with speech
(transcribe only the speech).

Profanity should be fully transcribed. However, feel free to skip a sentence that you
feel uncomfortable transcribing.

If the context of an alpha-digit sequence suggests it may be a password, credit card


number, social security number, etc., then use [skip].

Hesitations and truncations

Do not transcribe false starts unless they are complete words.

nepopular "nepop- nepopular"

mai mare decât "ma- mai mare decât"

mai mai mare decât "mai [pause] mai mare decât"

Complete words that have been truncated only if a very small portion of the word is
missing (one syllable or less in a multisyllable word) and it is obvious what the word

https://speech.google.com/annotation/guidelines/ro_ro_test_set/index.html 18/20
10/4/2017 ro-RO_TEST_SET

should be. In cases of ambiguity, do not transcribe the cut-off word. Do not put
punctuation at the end of truncated words.

"trăiesc în san francisc-"


Trăiesc în San Francisco
Final sound "o" was truncated.

"inamo bucurești"
Dinamo București
Initial sound "d" was cut o .

Transcribe repeated words as many times as uttered, but [skip] if a phrase is repeated
more than ve times.

Vreau să cumpăr cumpăr o


pelerină de ploaie. Transcribe what is said. Since speaker said "cumpăr"
NOT: Vreau să cumpăr o twice, transcribe it twice.
pelerină de ploaie.

"bună bună bună bună bună bună"


[skip] The audio contains a phrase repeated more than
ve times.

Do not transcribe ller words unless intended by the speaker to be transcribed.


Never lengthen them.

Dacă tu zici. "[sigh] dacă tu zici"

Era gen total dezinteresat. "era gen total dezinteresat"

Păi, nu știu. "păăăăăăi nu știu"


NOT: Păăăăăăi, nu știu. Do not lengthen ller words.

E telefonul tău? "e ăăă telefonul tău"

Background and foreground speech

Only transcribe foreground speech. A user's speech may go from the foreground to
the background or vice versa (determined by change in volume) and can be
accompanied by change in speaker audience.

Speaker says loudly, "pirania" and then quietly, "Tu


pirania
doar spui cuvântul si caută pirania."

Găsește Co e Society în The speaker changes audience but not volume, so


Cupertino. Acolo mergem, nu? transcribe both sentences.

If two people take turns, without overlap, and are both in the foreground at roughly
the same volume, transcribe the speech of both speakers. Separate the dialogue of
different speakers with end punctuation.

Vii și tu? First foreground speaker asked "Vii și tu?", other foreground speaker
Da. answered "Da."

If two or more people are speaking at once with no one clearly in the foreground, tag
as [overlapping]. Do this for overlaps longer than one second. Use this tag even when
one person is a bit louder than the other(s) and you can tell what they're saying.

https://speech.google.com/annotation/guidelines/ro_ro_test_set/index.html 19/20
10/4/2017 ro-RO_TEST_SET

Foreign language

If an utterance is in a foreign language, tag with [skip], unless it is an easily identi able
media title or a foreign language phrase commonly understood in the transcription
language. Stick to the capitalization and punctuation conventions of your target
language.

This is a very common phrase that most Romanian speakers


Ciao
know, spelled as in Italian.

Un pahar de Many Romanian-speakers know the French word


Chardonnay, vă rog. "Chardonnay".

Accents

Correct non-standard pronunciations to their standard ones. Non-standard


pronunciations could be from speakers of regional dialects, language learners, or
speakers from different countries.

frecvent Person said "fregvent" with an accent.

I s-a stricat chiuveta.


Person said "ghiuveta" with an accent.
NOT: I s-a stricat ghiuveta.

https://speech.google.com/annotation/guidelines/ro_ro_test_set/index.html 20/20

You might also like