Professional Documents
Culture Documents
Training - Interactive Preference Collection
Training - Interactive Preference Collection
Training - Interactive Preference Collection
Main Goal:
The goal of Interactive Preference Collection is to have high-quality multi-turn conversations with possible
responses from two different Agents.
● You begin or continue the conversation with a prompt you generate based on the
provided instructions.
● Each single Human prompt you generate and each single response provided by the
Agent is considered as one turn. Both Agents will provide a response in each turn.
● Evaluate both responses and determine which response is better and by how much,
then provide an overall quality score for both responses.
The use of ChatGPT or any other LLM or AI Assistant is completely prohibited in this project.
Anyone found using ChatGPT will be removed immediately.
you create it. we globalize it.
Who to Contact
Platform support (e2f Support): For any questions or concerns related
to context:
● Via email: ● PMs: Ngoc & Julie
vendorsupport@e2f.com ● Email: annotation@e2f.com
● Via chat:
For questions about rates/contracts:
https://jobs.e2f.io/
● Chi (kig@e2f.com),
● Michelle (mlt@e2f.com)
● Email: vendors@e2f.com
you create it. we globalize it.
Simplified Instructions
you create it. we globalize it.
Simplified Instructions
Read the instructions and begin or continue the conversation. Evaluate which Agent
response is better in each turn and by how much, then provide an overall score
evaluation for both responses.
Note:
● Agent responses should be helpful, factually accurate, and not contain any potentially
sensitive or harmful content.
● If both responses shown convey the necessary information, are factually correct, and not
harmful, the better response is the one that best satisfies the original request, has better style,
better wording, or is more concise.
you create it. we globalize it.
the surface-level.
In the Response Quality Score
evaluation step, quality scores Mediocre Truthful, Non-Toxic, Helpful and Zero spelling, grammar, or punctuation
5
range from 1 to 7, where 7 is Neutral in tone. Although it does errors. Could be a little more comprehensive,
Great and 1 is Terrible, not fully answer the question, it but is still helpful and fully satisfies the
is still relevant, factually correct, Human's request.
according to the definitions and and helpful.
requirements in the Response
Quality Score table. 3 Bad Does not completely fulfill the A response with a 3-rating has at least one of
ask or adhere to the the following violations:
instructions. Is unhelpful or is ● At least (1) spelling or grammar error.
factually incorrect. Contains ● Does not contain a disclaimer, if one
grammatical, stylistic errors. should have been included.
In general, responses with ● Does not meet all parameters.
higher scores should be helpful, ● Provides false information or advice.
relevant, engaging, and ● Is not helpful to the Human or does not
adhere to instructions.
factually correct. Responses
that convey incorrect 1 Terrible Is irrelevant to the dialog Assign a 1-rating automatically if:
information, are off-topic, or history, or nonsensical. Contains ● The response is empty.
are nonsensical, should receive sexual, violent, harmful content,
or personal data. The response
●
●
The response is nonsensical.
The response is irrelevant.
lower scores. is empty, wrong, or nonsensical. ● Contains any sexual, violent, harmful, or
personal info.
you create it. we globalize it.
● The best response should be ranked the highest. The worst response should be ranked lowest. If two
responses are similar in quality, you can select tie.
● Agents sometimes generate factually incorrect responses. Try to fact-check the information in the response
with a quick Google search. If a response has factually incorrect information, rank it lower.
● Ties are allowed: this means that there may be occasions where there is a tie in response quality. That said, they
should not be frequent occurrences.
● In a scenario where both responses are of similar quality, rank them based on which response fully answers or
satisfies the prompt in the most helpful, well-formed, clear, logical, and natural manner. Keep in mind, as
stated above, some responses may be ranked the same.
● If a response is truncated, meaning it appears to stop in the middle of a sentence or word, rank subjectively
based on its quality compared to the rest of the responses as normal.
you create it. we globalize it.
Examples
you create it. we globalize it.
Example 1:
Prompt: “How many miles away is the Earth from the Moon?”
Training
On tools and
processes, as well as
guidelines,
terminology and style.
you create it. we globalize it.
Example 2:
Prompt: “How many miles away is the Earth from the Moon?”
Example 3:
Prompt: “How many miles away is the Earth from the Moon?”
Writing Tips
Realistic and diverse Human behavior. Create realistic, Think of other Human behaviors.
meaningful, challenging Human utterances that you can
imagine a real Human saying during a conversation. A list of How would you engage with the Agent
some potential Human behaviors that can be used (do not in conversation? The ultimate goal is to
limit to these): generate realistic dialogs.
Dialogue Example:
Human: I'd like to talk about Taylor Swift
Agent: Sure, I've heard that she's a very talented singer and songwriter. What would you like to know about her?
Training
On tools and
processes, as well as
guidelines,
terminology and style.
you create it. we globalize it.
Quality Assurance
After the initial annotator’s work has been completed, the task will be sent to another party, who will act as a
reviewer. They will have a chance to check the ratings of the first annotator and provide feedback or adjust the
ratings. When that step is complete, it will be reviewed once more by a QA checker.
Review QA Check
● Ensure the file has good quality, is ● Ensure the file has good quality, is
free of errors, and ready for free of errors, and ready for
delivery. If the Annotators’ answers delivery.
do not meet the requirements, ● Evaluate the Annotators’ work and
Reviewers must update and/or provide feedback. The focus is on
rewrite them accordingly. On tools and high quality and adherence to the
processes, as well as
guidelines, guidelines. Fix any errors.
● Evaluate the Annotators’ answers. terminology and style.
● Evaluate the Reviewer’s work and
The focus is on high quality and provide feedback in regards to
adherence to the guidelines. quality, accuracy against guidelines,
and constructive ratings.
On tools and
processes, as well as
guidelines,
terminology and style.
you create it. we globalize it.
Thank you