Professional Documents
Culture Documents
Forester Spanish (Chile) Transcription Guidelines
Forester Spanish (Chile) Transcription Guidelines
Forester Spanish (Chile) Transcription Guidelines
Any
publication, provision, or dissemination of this content is strictly
prohibited. Do not share or post the contents on the internet.
Transcription Guidelines
Pautas de Transcripción
Introduction
Introducción
Project goal: The goal of this project is to transcribe audio files that will
ultimately help our client build state of the art automatic speech
recognition models.
The aim of this project is to accurately transcribe (i.e. type out or represent with
pre-filled tags) the speech presented to you in audio files. You will be using our
online transcription platform called "Ampersand". A separate guide is provided for
using Ampersand.
Please read these guidelines in full and keep them handy when you start
transcription. There are a lot of things to remember, but you will find it gets easier
once you have done a few transcriptions. If anything is unclear, please contact your
project supervisor. Good luck!
General information
Información General
Speech, non- The purpose of this project is to transcribe all valid
speech as well as the non-speech sounds which occur at
speech
the same time as speech.
noise, and Speech is anything which contains human language. In this
no-speech project, we transcribe speech even if it is not grammatically
correct — including:
Habla, ruido
● hesitations ("am", "ehh","mmm"),
no hablado y ● colloquial words ("cachai", "tinca" ), and
no habla ● repeated words ("es que es que", “haz de cuenta que
que").
Example
1
NOTE: All information provided in this document is confidential. Any
publication, provision, or dissemination of this content is strictly
prohibited. Do not share or post the contents on the internet.
● TRANSCRIPTION: am si seguro.
Foreground Your volume settings should be set so that the loudest speaker
speech/noise in the utterance is at a comfortable volume. Foreground
speech is any speech which can be clearly understood at that
volume, without straining or repeated listening.
Discurso en Speech and noises which are clearly quieter than this volume
primer should not be transcribed or tagged, even if they are audible
plano/ruido and intelligible.
Batch
A batch of transcription work is a single, continuous audio file
which is further divided into pages and utterances.
Lote
Transcribing speech
2
NOTE: All information provided in this document is confidential. Any
publication, provision, or dissemination of this content is strictly
prohibited. Do not share or post the contents on the internet.
Transcribiendo discurso
Use standard Spanish spelling.
This includes characters like ñ, accents (á, é, í, ó, ú), and
dieresis (ü).
Example.
● TRANSCRIPTION: el concepto de oferta-demanda en el
mundo académico-profesional es diferente al que se nos
enseñó.
Capital letters Use Spanish capitalization rules with one exception: do not
use a capital letter if the only reason to do so is that the
word is at the start of a sentence.
Letras
mayúsculas Most person names ("Vicente Fernández"), location names ("El
Ángel de la Independencia", "Chile"), products, and brand
names ("La Gorda", "YouTube") should be capitalized.
3
NOTE: All information provided in this document is confidential. Any
publication, provision, or dissemination of this content is strictly
prohibited. Do not share or post the contents on the internet.
u Example
A speaker says a word in Spanish that you don't
understand
TRANSCRIPTION: a ella no le parece que la sea
por nuestra cuenta.
4
NOTE: All information provided in this document is confidential. Any
publication, provision, or dissemination of this content is strictly
prohibited. Do not share or post the contents on the internet.
Example:
A speaker says a foreign word after “el” and you
cannot identify the foreign word
Foreign
TRANSCRIPTION: creo que el problema es que el
Speech
que usamos es diferente.
Example:
A speaker says “we will have lunch together“ in the
middle of a sentence but you do not understand
5
NOTE: All information provided in this document is confidential. Any
publication, provision, or dissemination of this content is strictly
prohibited. Do not share or post the contents on the internet.
Example:
A speaker says “we will have lunch together“ in the
middle of a sentence and you understand the words
/!\ Tips:
Numbers Numbers should be spelled out as full words, the way they
were said.
6
NOTE: All information provided in this document is confidential. Any
publication, provision, or dissemination of this content is strictly
prohibited. Do not share or post the contents on the internet.
Examples
Números
The number '1989' may be said in many different ways
Example
● H2O ==> TRANSCRIPTION: H2O
● iPhone 6S ==> TRANSCRIPTION: iPhone 6S
● PS4 ==> TRANSCRIPTION: PS4
However
7
NOTE: All information provided in this document is confidential. Any
publication, provision, or dissemination of this content is strictly
prohibited. Do not share or post the contents on the internet.
● J O S É. TRANSCRIPTION: J O S É acentuada.
8
NOTE: All information provided in this document is confidential. Any
publication, provision, or dissemination of this content is strictly
prohibited. Do not share or post the contents on the internet.
Adolfo punto CH.
Inappropriate
language All inappropriate language should be transcribed. If you feel
uncomfortable typing a particular word, use the unintelligible
Lenguaje tag (see unintelligible tag) in its place.
inapropiado
Transcribe hesitations and other disfluencies like mmm and ah.
List of Hesitations/Interjections
Acceptable
Meaning
Spelling
Agreement ajá, sip
Disagreement nah, oh oh, nel
Hesitation ehh, am, mmm,
guau, wow, ah,
Surprise
ay, yay, éjale
Hesitations
and Seeking
eh, mhm,
Confirmation
interjections
Disgust bah, uy
Vacilaciones e eh, wow, guau,
Delight
interjecciones ah, wao
ehi, hey, eh, oh,
Calling Someone
oye, yo
eh, oh, ah, uh,
Emphasizing
ey
Example
9
NOTE: All information provided in this document is confidential. Any
publication, provision, or dissemination of this content is strictly
prohibited. Do not share or post the contents on the internet.
Shortc
Span Tag How to use it
ut
For non-standard words and spellings that often
10
NOTE: All information provided in this document is confidential. Any
publication, provision, or dissemination of this content is strictly
prohibited. Do not share or post the contents on the internet.
appear in spoken language, transcribe what is
heard and highlight the word using the
colloquial span tag.
Example
Speaker's Transcription Full Form
Pronunciation
Example:
11
NOTE: All information provided in this document is confidential. Any
publication, provision, or dissemination of this content is strictly
prohibited. Do not share or post the contents on the internet.
the normal (correct) way, then highlight it.
There is no need to use this if someone has an
accent — it should only be used when the
person accidentally said something the wrong
way. When in doubt ask yourself "would this
person pronounce the word differently if I
asked them to repeat themselves?" If they
would, it can be classified as a
mispronunciation.
Example
You hear “¿vas a ir a la inglesia?”
TRANSCRIPTION: ¿vas a ir a la iglesia?
Example:
12
NOTE: All information provided in this document is confidential. Any
publication, provision, or dissemination of this content is strictly
prohibited. Do not share or post the contents on the internet.
spell correctly by doing a quick online search.
Examples:
/!\ Remember:
If you hear something in Spanish but cannot
make out at all the word = use
13
NOTE: All information provided in this document is confidential. Any
publication, provision, or dissemination of this content is strictly
prohibited. Do not share or post the contents on the internet.
Example
You hear some speech punctuated by a cough,
followed by a 1 second pause, and then a loud
noise:
TRANSCRIPTION: nunca lo escuche.
Example
TRANSCRIPTION:
You must ignore all sounds if there is no speech in
the entire utterance.
14
NOTE: All information provided in this document is confidential. Any
publication, provision, or dissemination of this content is strictly
prohibited. Do not share or post the contents on the internet.
Use for all sounds made by a foreground human
which is not speech (e.g. breath, cough, lipsmack,
and laughing).
15
NOTE: All information provided in this document is confidential. Any
publication, provision, or dissemination of this content is strictly
prohibited. Do not share or post the contents on the internet.
(where the person stops talking part way through a
word). In a truncation, the recording has cut
someone off while they were saying a word.
Therefore, truncations only occur at the start or end
of an utterance.
When you hear a truncation at the end of an
utterance and you can transcribe the word with
certainty, write out the truncated word in full
Example
The word 'probablemente' is split with "prob-"
at the beginning of the first utterance and "-
ablemente" at the beginning of the second
utterance.
Example
An unintelligible word is truncated.
16
NOTE: All information provided in this document is confidential. Any
publication, provision, or dissemination of this content is strictly
prohibited. Do not share or post the contents on the internet.
If you come across user-identifiable
information (UII), do not transcribe those words,
and insert this tag instead. The purpose is not to
disclose a user's private information.
UII includes things like full names, usernames,
gamertags, street addresses, telephone numbers,
credit card numbers, social security numbers, etc.
There are exceptions. You do not need to mark UII if
the information is public, e.g.:
Punctuation
Puntuación
17
NOTE: All information provided in this document is confidential. Any
publication, provision, or dissemination of this content is strictly
prohibited. Do not share or post the contents on the internet.
have a subject and verb include answers to questions (e.g. "sí." and
"no.") and exclamations ("¡de verdad!" and "¿cuándo?").
It is also possible to not write the subject and only the verb, where the
subject is implicit. (e.g. "voy." and "estás llorando.")
Example
At the end of each sentence, use either a period (.) for statements, at
the beginning and at the end, question marks (¿?) for questions, or
exclamation marks (¡!) for exclamations.
Do not use punctuation combinations ("?!", "!!!", "...").
● TRANSCRIPTION:
UTTERANCE 1: exentar el examen. ¿tú piensas
UTTERANCE 2: lo lograría? quizás si
18
NOTE: All information provided in this document is confidential. Any
publication, provision, or dissemination of this content is strictly
prohibited. Do not share or post the contents on the internet.
You do not need to use the incomplete tag when the speaker restarts
or repeats a single word.
Use commas (,) in two situations only:
When unsure whether to use a comma, err on the side of not using
one.
Resources
Recursos
● Spanish Punctuation Rules
● Capitalization in Spanish
● Spanish Dictionary
19