Span

Step 1
Determine if the span contains factual claims.

A span does contain factual information when:
 The span contains specific factual information
 e.g. "Tripping over pets causes more than 86,000 falls each year in the United States
that are serious enough to result in a trip to the emergency room."
 The span makes general claims about the world or contains common knowledge
 e.g. "Most major cities have activities and entertainment for people of all ages."
 The span contains evaluative statements
 e.g. "Rambutan is a delicious fruit."
 The span contains advice and instructions, e.g., "To make pour over coffee, first
measure out roughly 1g of coffee per 16ml of water."
On the contrary, it doesn't contain factual information if:
 The span is purely conversational
 e.g., "Good night." or "Do you like a good deal?"
 The span is a generic disclaimer
 e.g., "Speak to a doctor before making decisions."
 The span is a first-person opinion
 e.g., "I like cake.."
 The span contains phrases that structure the response, including phrases that
introduce a list, summaries of other parts of the response, or conclusions that follow from
other parts of the response
 e.g., for the response "Benefits of coffee include: 1) Brain function: Coffee can improve
…", the span "Benefits of coffee include:"
 The span rephrases the user's input for confirmation

 e.g. "Sure, here are some ideas for a kid’s fourth birthday party:"
 The span only contains links - We shouldn't verify whether the URLs exist or contain
factual info, mark the span as 'No factual information'.
 The span only contains fictional or creative content that the system made up
 e.g., "Once upon a time, there was a cowboy named Jack."
NOTE: If, fictional content is mixed with factual content, or referred to in spans mainly
containing factual information, consider the span as conveying factual information, e.g.:
 there is a mix of creative content and factual assertions: "Jack time-traveled back to
1863 when Lincoln gave the Gettysburg Address."
 the span contains non-fictional facts written in artistic styles, e.g., generated poems
about Abraham Lincoln's achievements.
 the span contains verifiable facts about established fictional works, e.g., plot points in
Lord of the Rings.
 Also check Section Corner cases for edge case guidance.
If the span is deemed not to contain factual information, select No factual information as the
rating and move on to the next span; if it does contain factual information, select one of the
Factuality rating values in the next section.
Step 2
Factuality Ratings
If the span has factual claim
Rate the accuracy of factual spans with one of the following:
"Inaccurate": At least one claim in the span is factually incorrect.
 Use this label when there is any contradicting URL and no supporting URLs for that
claim.
 Some claims contradict common knowledge (e.g. "white is a dark color") or are simple to
disprove (e.g., 'dream' is a 4 letter word). In such cases, rate as "Inaccurate" even when
there are no contradicting URLs.
 When selecting "Inaccurate", an additional question to assess the severity of the
inaccuracy will pop up, see next page for guidance.
 At least 2 URLs are required.
"Unsupported": At least one claim in the span is likely made up. If you have spent a reasonable
amount of time (e.g., 10 minutes) researching and found no supporting or contradicting URLs,
and the claim is not clearly correct or incorrect, use this label.
 Use the Comments field to briefly summarize the research process.
"Disputed" (e.g., controversial / mixed opinion / pros and cons): At least one claim has both
supporting and contradicting evidence that cannot be reconciled and does not have a universal
consensus.
 Add both supporting and contradicting URLs in the fields below.
 Important: Not all subjective claims and opinions should be labeled as "Disputed". If the
statement is agreed upon by virtually all experts ("the earth is round"), mark the claim as
"Accurate." Similarly, if the statement is broadly disagreed, mark it as "Inaccurate."
 The main difference between "Unsupported" and "Disputed" is that for "Disputed" you
must have found equally acceptable URLs both supporting and contradicting the span;
whereas for "Unsupported", you must have found NO evidence whatsoever supporting or
contradicting the claims in the span.
 Don't use Unsupported or Disputed when you yourself are unsure about the answer
(use "Can't confidently assess" instead).
"Accurate": All claims in the span are factually correct.

 Use this label when you have found supporting URLs and no contradicting URLs.
 Some claims are common knowledge (e.g. "black is a dark color") or simple to verify
(e.g., 'dream' is a 5 letter word). In such cases, rate as "Accurate".
"Can't confidently assess":
 The sentence contains claims that require a high level of expertise on the matter to
verify their accuracy, or are too complex (e.g. high-level mathematical operations,
coding, etc.).
 The sentence contains foreign language that’s other than your assigned language.
 The sentence is a hard punt.
Please note!
Important: Main differences between "Inaccurate", "Unsupported" and "Can't confidently
assess.
 "Inaccurate": You found direct contradicting evidence, or lack of positive evidence
enough to clearly indicate that the claim is wrong (e.g. example above about the latest
film by a famous director).
 "Unsupported": you would normally be able to find the information on the web for similar
claims, but in this case the search results are not enough to determine whether the claim
is true or false.
 "Can't confidently assess": you don't know where to find the right information because of
the complexity of the claim, or when it is not "searchable" information (e.g. the result of a
complex mathematical operation that you can't solve by your own means).
Important: Note that because one span can contain several claims, we have a hierarchy of
which label to assign if the labels are different for the different claims. The hierarchy is as follows:
1. If one claim in the span is “Inaccurate”, the overall label is “Inaccurate”.
2. Otherwise, if one claim in the span is “Unsupported”, the overall label is “Unsupported”.
3. Otherwise, if one claim in the span is “Disputed”, the overall label is “Disputed”.
4. Otherwise (i.e. all claims in the span are “Accurate”), the overall label is “Accurate”.
Severity of inaccurate claims:
When you choose "Inaccurate" as rating for a span, a new question to assess the severity of this
inaccuracy will emerge: Are the non-accurate claims in this sentence providing misleading
information either (1) directly related to addressing the prompt or (2) potentially harmful
to users or other people if relied upon?
Examples:
The concept of "harmful" is debatable, but if you're in doubt mark the response as "True" and
add a comment with your rationale. Example:

Inaccurate:
 Consideration: Mark a span as “Inaccurate” if at least one claim within it is factually
incorrect, contradicted by reliable sources, or goes against common knowledge.
Example 1
The source found that the official website of the US Bureau of Labor Statistics (a trustworthy,
official source), explicitly states that there is an anticipated decline in job outlook: this contradicts
the claim made by the span.

This span needs to be marked as "True" for the question about severe inaccuracy, as it provides
inaccurate information in a sentence that responds directly to the main question in the prompt.
Example 2
From the response, we see that this bulleted item appears in the context of "a list of acting
credits for Christopher Daniel Barnes". The claims should be interpreted with respect to this
context.
Note that two claims are being made in the span: 1) Christopher Daniel Barnes played Spider-
Man in Ultimate-Spider-man 2) Ultimate Spider-Man ran from 2012 to 2017. The first claim has
contradicting evidence, as shown by the URL provided. The role of Spiderman was voiced by
Drake Bell, whereas Christopher Daniel Barnes voiced a different character (Electro), which
contradicts the claim in the span.
The second claim (‘Ultimate Spider-man ran from 2012 to 2017’) is actually correct if you look at
https://en.wikipedia.org/wiki/Ultimate_Spider-Man_(TV_series), a reputable source. However,
since not all claims are supported, we do not need to provide this URL.
Regarding the severe inaccuracy question, this is rated as "False" as even if the information is
inaccurate, it's not either harmful or responding directly to the question in the prompt.
Accurate
 Consideration: Mark a span as “Accurate” if all claims within it are factually correct and
supported by reliable sources.
Here is an example of a correct claim with supporting evidence from the Web:
In this example, we find a trustworthy source (Wikipedia) that explicitly states that Elizabeth
Warren was a Republican from 1991 to 1996. Therefore, we can state that the claims in the span
are "Accurate."
Unsupported
 Consideration: Use “Unsupported” if a claim seems fabricated or if no supporting or
contradicting evidence is found after reasonable research.

Here is an example of a sentence with unsupported claims, i.e they are most likely made up or
"hallucinated" by the chatbot: we can find sources on the issue but we do not find explicit
evidence that supports or contradicts the claims in the span.
A web search looking for movies/TV shows filmed in the restaurant Peter Luger Steak House
does not explicitly mention anywhere that "When Harry Met Sally" or "Seinfeld" were filmed in
this restaurant. Therefore, we can conclude that the claims in the span are hallucinated. URLs
researched for hallucinations should NOT be put in the "Supporting URLs" field. Feel free to note
them in the comments field, however.
Disputed
 Consideration: Label a span as “Disputed” if it contains a claim with both supporting and
contradicting evidence, and no consensus can be reached.
The following are examples of claims that should be considered "Disputed":
 Claims with multiple viewpoints

 e.g., "The French Revolution ended in 1799." has both supporting evidence ("The French
Revolution … ended with the formation of the French Consulate in November 1799.")
and contradicting evidence ("The French Revolution began in 1789 and lasted until
1794."), depending on which event should mark the end of the period.
 Claims with multiple opinions without a consensus
 e.g., "X was a good president." or "X University has a strong academic reputation."
 Important: However, if there is a general consensus, especially when there is
trustworthy evidence for the consensus (e.g., scientific findings or recognition awards),
mark it as either "Accurate" or "Inaccurate"
 e.g., "Harvard University has a strong academic reputation" (Accurate). In such cases, it
is likely that either the supporting or contradicting URLs fields would be empty.
 Claims that depend on the circumstances or are time-sensitive
 e.g., "Commodities, such as gold and oil, are a great investment."
 However, "One investment option is commodities." or "Some people consider
commodities to be a great investment." should be marked as "Accurate" since they leave
possibilities open for other circumstances. Supporting URLs should be provided in this
case.
Can’t Confidently Assess

 Consideration: Use this label for spans that:
 Require specific expertise to verify or are too complex to assess with available
resources
 Are in a foreign language other than your target language
 Are HARD punts (see next page for details on hard vs. soft punts)
 Example: The sentence includes a complex mathematical theorem or a highly technical
medical claim that is beyond general knowledge and requires expert verification.
 Explanation: The claim’s accuracy cannot be confidently assessed without specialized
knowledge in the relevant field.

Last IMPORTANT notes!
UPDATED INSTRUCTIONS: No task should be skipped!
 Response (partly) is in a foreign language, affecting your understanding:
 Select "Can't Confidently Assess"
 Chatbot gives "canned responses" (punts):
There are two types of punts:
 Hard punts: The model replies with a very short answer, usually no more than one or
two short sentences, stating that it cannot reply to the question. Examples:
 “I'm still learning how to answer this question. In the meantime, try Google Search.”
 “I’m a large language model and I can’t answer this question.”
 “I'm sorry, but I can't answer that question. It's a very sensitive topic, and I don't want to
offend anyone. Perhaps you could ask a historian or a religious scholar instead.”
 Soft punt: The model actively refuses to answer the question or ignores it, but provides
a longer answer than a hard punt, that may include, for instance, more on the reasons
why it’s refusing to answer, or additional information about topics related to the question
(but that doesn't provide a definitive answer). Responding with a soft punt is also known
as "hedging".
For hard punts, rate as "Can't Confidently Assess"
Please rate soft punts normally.
Please see the detailed guidelines doc for visual examples!
Cheat Sheet
Please use this picture as a cheat sheet.
You can also find the link to the summary of the instructions here.

Span

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Span

Uploaded by

Copyright:

Available Formats

Step 1

Determine if the span contains factual claims.

 The span contains specific factual information

that are serious enough to result in a trip to the emergency room."

 The span contains evaluative statements

 e.g. "Rambutan is a delicious fruit."

measure out roughly 1g of coffee per 16ml of water."

On the contrary, it doesn't contain factual information if:

 The span is purely conversational

 e.g., "Good night." or "Do you like a good deal?"

 The span is a generic disclaimer

 e.g., "Speak to a doctor before making decisions."

 The span is a first-person opinion

 e.g., "I like cake.."

other parts of the response

…", the span "Benefits of coffee include:"

 The span rephrases the user's input for confirmation

factual info, mark the span as 'No factual information'.

 e.g., "Once upon a time, there was a cowboy named Jack."

1863 when Lincoln gave the Gettysburg Address."

about Abraham Lincoln's achievements.

Lord of the Rings.

 Also check Section Corner cases for edge case guidance.

Factuality rating values in the next section.

Rate the accuracy of factual spans with one of the following:

"Inaccurate": At least one claim in the span is factually incorrect.

there are no contradicting URLs.

 When selecting "Inaccurate", an additional question to assess the severity of the

inaccuracy will pop up, see next page for guidance.

 At least 2 URLs are required.

 Use the Comments field to briefly summarize the research process.

 At least 2 URLs are required.

 Add both supporting and contradicting URLs in the fields below.

"Accurate." Similarly, if the statement is broadly disagreed, mark it as "Inaccurate."

contradicting the claims in the span.

(use "Can't confidently assess" instead).

"Accurate": All claims in the span are factually correct.

(e.g., 'dream' is a 5 letter word). In such cases, rate as "Accurate".

 At least 2 URLs are required.

"Can't confidently assess":

 The sentence is a hard punt.

Important: Main differences between "Inaccurate", "Unsupported" and "Can't confidently

 "Inaccurate": You found direct contradicting evidence, or lack of positive evidence

film by a famous director).

1. If one claim in the span is “Inaccurate”, the overall label is “Inaccurate”.

Severity of inaccurate claims:

to users or other people if relied upon?

add a comment with your rationale. Example:

incorrect, contradicted by reliable sources, or goes against common knowledge.

the claim made by the span.

contradicts the claim in the span.

https://en.wikipedia.org/wiki/Ultimate_Spider-Man_(TV_series), a reputable source. However,

supported by reliable sources.

contradicting evidence is found after reasonable research.

evidence that supports or contradicts the claims in the span.

them in the comments field, however.

contradicting evidence, and no consensus can be reached.

The following are examples of claims that should be considered "Disputed":

 Claims with multiple viewpoints

 Claims with multiple opinions without a consensus

 Important: However, if there is a general consensus, especially when there is

mark it as either "Accurate" or "Inaccurate"

 Claims that depend on the circumstances or are time-sensitive