Professional Documents
Culture Documents
Forester English (Canada) Annotation Guidelines 2023 - ADAP QF
Forester English (Canada) Annotation Guidelines 2023 - ADAP QF
Any
publication, provision, or dissemination of this content is strictly
prohibited. Do not share or post the contents on the internet.
General guidelines
This task requires that you listen to audio files and classify them using the radio button
labels below. You do not need to type out what the speaker says. This is done later.
Please read the guidelines below carefully, and if anything is unclear, contact your
project manager.
Using labels
The aim of this task is to identify files that have enough intelligible English speech from
a native Canadian speaker in them to be worthwhile transcribing.
We want to target audio from Canadian accented English speakers only. We want to
minimise the amount of speech that is not Canadian accented English to be transcribed,
so files that also contain a noticeable amount of other accents or other languages
should be labelled as foreign or other-english.
If the file can be transcribed, listen to it again to check if it contains personal information
(see UII below), if not, add the transcription label and move on.
Make sure to listen to all of the speech in case there is UII that needs to be marked.
Please use headphones when working on this task. This will ensure you can hear the
audio clearly. Set your volume to a comfortable level (80%) so that it is not so loud that
your ears hurt, but also not so quiet that you might not hear important sounds and
speech.
NOTE: All information provided in this document is confidential. Any
publication, provision, or dissemination of this content is strictly
prohibited. Do not share or post the contents on the internet.
Label list
Label Reason
other-english
The audio contains more than 10% of English speech
that is NOT in Canadian accented English.
Transcription
The majority of the audio contains Canadian accented
transcription English speech that can be transcribed. This includes
people speaking, and also words coming from the
television or radio.
NOTE: All information provided in this document is confidential. Any
publication, provision, or dissemination of this content is strictly
prohibited. Do not share or post the contents on the internet.
Examples:
● 50% silent + 50% Canadian English =
Transcription.
● 50% silent + 40% Canadian English + 10%
Australian English = other-english
Singing
The majority of the audio contains only human singing.
Corrupted
The majority of the audio has static or is machine
corrupted augmented/contorted in a way that does not sound like
a human voice.
Explicit
Use this label if the audio contains explicit/graphic
content such as pornography, extreme violence, or hate
speech. Note that a recording containing bad language
alone (i.e. without violence/harassment/hate speech)
should not be considered as containing explicit content.
1. sexually explicit
2. violence
3. harassment/threats
NOTE: All information provided in this document is confidential. Any
publication, provision, or dissemination of this content is strictly
prohibited. Do not share or post the contents on the internet.
4. suicide or self-injury
5. hate speech
6. terrorism
7. blasphemy
8. other
Extra notes: