Telugu Guidelines - Mihup Tool

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

a.

General Guidelines
 Transcribers and QAs are required to reach a minimum accuracy standard of 99%. There
are a few simple things you can do to fulfill the criteria and prevent obvious errors.
 Native Telugu words and numbers should be written in Telugu script; for instance, నాకు
ఐదు వందల ఇరవైఆరు రూపాయల నగదు కావాలి
 Maintain the uniformity of the word spelling; it should be standard and unique.
 Check the word spelling; it should be correct as per standard grammar.
 To maintain the coherence of the conversation, mark out appropriate punctuation at
the proper place.
 Native English words must be written using native English spelling in Roman script, for
instance, ఈరోజు కూడా భారీవర్షాలు కురిసేఅవకాశం ఉందని weather office
తెలిపంది
 All Abbreviations and single English characters must be in uppercase; for instance, నేను
minimum balance maintain చేయకుంటేనా bank ఖాతా block చేయబడుతంది. మీరు
మీ పేరును R A M C H A R A N లాగా వ్వవాయగలర ా?
 Named Entities (i.e., Person name, Place Name, or organization name) must be written
in roman script with the angle bracket symbol “<>”, for instance, , , , and . పరయ టన
చాలా ఉపయోగకరంగా ఉందని అనాా రు. For more clarity, take a look at the
following example:
- Incorrect : నా సోదరుడిమొబైల్ నంబర్ 66292988
- Correct: నా సోదరుడిmobile number six six two nine two nine eight eight
- Incorrect : ఎమర్ జన్సె ీ కాల్ చేయడానికిమీ ఫోన్ ని ఉపయోగంచండి
- Correct: emergency call చేయడానికిమీ phone-ని ఉపయోగంచండి
- Incorrect: అమెరికా అధ్య క్షుడు జో బిడెన్ పాటు వ్రపపంచ సీనియర్ నేతలు
ఢిల్లీ సమావేశమవుతారు.
- Correct : అధ్య క్షుడు పాటు వ్రపపంచ senior నేతలు లో సమావేశమవుతారు.
 Please be careful to spell-check your work. Any kind of error-ridden data will not be
accepted. Non-transcribed data that impacts the audio quality, such as overlapping,
noise, inaudible, and other types of noise, should be trimmed from the audio.
 While transcribing, you will often encounter non-verbal sounds such as laughter and
applause. Often, non-verbal sounds will have no effect on the flow of speech or on the
meaning of conversation and should therefore be trimmed from the audio.
 Please listen to the audio you are transcribing to make sure the transcription makes
sense. Sometimes, what is being said can provide sufficient context to correctly identify
a word you are having trouble with.
 Be cautious with your punctuation. Check that your sentences are coherent.
 Make sure names are spelled correctly. When in doubt, you should Google; this is
especially important for files with lots of references to names of products or
places.Finally, and the most crucial step of all, ensure that your transcription is accurate.
You should not speed through difficult parts. If you've tried to identify speech but are
still stumped, use the tags provided in this style guide.
 While transcribing, listen to audio only with Headphones.

Note: Before tagging anything as [inaudible], [bad quality], [background noise], or


[incomprehensible], please also make an effort to listen to the word or phrase. It also helps to
take into account what is being said. If a word or phrase makes no sense in context, it is
probably wrong.

b. FILLER WORDS:

Stutters, filler words, and crutch words are common features of human speech. They are usually
used when a speaker is searching for thought or deciding how to properly express.
Examples of crutch words and fillers can include the common um, ihh, aaa... and uh .
You can usually escape crutch words and fillers based on the context of their use and the way
they are spoken. Consider the following:
Incorrect: aaa.., i'm thinking of your other ideas and some of the, um, suggestions you made
earlier.
Correct: i’m thinking of your other ideas and some of the suggestions you made earlier.

c. Transcription Style Guide


Transcription work has a set of problems unique to the task of copying out the spoken word.
Audio files can suffer from poor recording quality; speakers are often inaudible, and there are
sometimes non-verbal sounds that need to be trimmed.
When transcribing, you will often encounter non-verbal sounds such as laughter and applause.
Often, non-verbal sounds will have no effect on the flow of speech or on the meaning of
conversation and should therefore be ignored in the transcription and trimmed from the audio
file.
d. Important
Some audio files may contain technical language or make references to products, objects, or
places that are unfamiliar to you. It is expected that QAs and transcribers make an effort to get
unfamiliar words or phrases correct by googling them. If you still cannot be sure of the correct
word after searching online, either guess the word and place it in square brackets or insert the
[?] tag, as outlined above. Before tagging anything as inaudible or indecipherable, please also
make an effort to listen to the word or phrase before giving up. It also helps to take into account
what is being said. If a word or phrase makes no sense in context, it is probably wrong.
e. Numbers
 Numbers should be spelled out (one, two, three... ten).
 When you are writing phone numbers, Do not type out the numerals, e.g., 437-8242-
376. Spelling it out as it is spoken in the audio.
 When writing percentages, do not use the % symbol, e.g., 50%. Type out the percent.
 When writing fractions, don’t use numerals, e.g., three-fifths instead of 3/5 or a quarter
instead of 1/4. 2.2 Audio Quality Check/Audio Validation
 It’s mandatory for the organization or team to have a QC team and do a quality check on
each part of the data.
 If and only if the organization's QC team approves the data, it will be sent to the Final
QC team. This is an automated process.
 The data has to be submitted with proper QC. If the WER (Word Error Report) is high for
three weeks straight. We will have to reject the data along with the ID.
 The organization or team will be impacted by the error report. Each organization will
receive a score based on the quantity, quality, and number of IDs in the data.
 The renovation of the agreement will be dependent on the scorecard.
 No payment will be made on the rejected ID

You might also like