Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

MPE

Not a scientific statement dataset but a multi premise entailment dataset of 10k sentences.
Variant of the standard textual entailment task in which the premise text consists of multiple
independently written sentences, all describing the same scene (captions from the same FLICKR30K
image). The task is to decide whether the hypothesis sentence 1) can be used to describe the same
scene (entailment), 2) cannot be used to describe the same scene (contradiction), or 3) may or may not
describe the same scene.

Example:
Premises: 1. Two girls sitting down and looking at a book.
2. A couple laughs together as they read a book on a train.
3. Two travelers on a train or bus reading a book together.
4. A woman wearing glasses and a brown beanie next to a girl with long brown hair holding a book.
Hypothesis: Women smiling. ⇒ENTAILMENT

ANLI

Not a science domain but a diverse genre dataset with paragraph type premises and single statement
hypothesis to be proven. Size= 103k
Best Model: Roberta, XLNet, etc.
Example:
"premise": "Javier Torres (born May 14, 1988 in Artesia, California) is an undefeated Mexican
American professional boxer in the Heavyweight division.
Torres was the second rated U.S. amateur boxer in the Super Heavyweight division and a
member of the Mexican Olympic team.",
"hypothesis": "Javier was born in Mexico”

Best Model: Transformers based GPT, XLNet, Roberta, etc.

QASC

8 way MCQ —9000 QA


Along with annotated 2 sentence facts for inference.

Example:
What can trigger immune response?

(A) Transplanted organs (B) Desire (C) Pain (D) Death

fS: Antigens are found on cancer cells and the cells of transplanted organs.
fL: Anything that can trigger an immune response is called an antigen .
Composed fact/ hypothesis: transplanted organs can trigger an immune response

Best Model: T5 model pretrained with multiple QA datasets (UnifiedQA)

eQASC

Annotated data for QASC questions with valid and invalid explanations. 98k annotations.

Example:
Best model: Generalised reasoning chains (delexicalized chain representation) using BERT
(X can cause Y AND Y can start Z → X can cause Z)
SciTail

Entailment dataset created from science exams QA.


27k entailment and neutral pairs.
Example:

HYPOTHESIS: Stems transport water to other parts of the plant through a system of tubes.

SUPPORTING PREMISE (ENTAILS):Water and other materials necessary for biological activity in
trees are transported throughout the stem and branches in thin, hollow tubes in the xylem, or wood
tissue.

NON-SUPPORTING PREMISE (NEUTRAL):Cut plant stems and insert stem into tubing while stem
is submerged in a pan of water.

Best Model: (98% accuracy) Decoding-enhanced BERT with disentangled attention

SciQ

Question Answering dataset with support for some questions. The SciQ dataset contains 13,679
crowdsourced science exam questions about Physics, Chemistry and Biology, among others. The
questions are in multiple-choice format with 4 answer options each. For the majority of the questions,
an additional paragraph with supporting evidence for the correct answer is provided.

Question: ”What is an area of land called that is wet for all or part of the year?”,

Answer options: ”distractor3": “tundra", "distractor1": “plains", "distractor2": “grassland",


"correct_answer": “wetland",

Support: "A wetland is an area that is wet for all or part of the year. Wetlands are home to certain
types of plants."

EntailmentBank

Dataset of 1500 multi-step annotated explanation proof trees of science concepts question answers. It
has 3 tasks: Generate the correct entailment tree to explain the question’s answer given a) correct facts
from gold tree leaves b) some relevant and irrelevant facts c) a corpus of sentences.

Example:
"context": sent1: florida is a state located in the united states of america
sent2: united states is located in the northern hemisphere
sent3: the winter in the northern hemisphere is during the summer in the southern hemisphere
sent4: the south pole is tilted toward the sun
sent5: summer is when a hemisphere is tilted towards the sun
sent6: the south pole is located in the southern hemisphere"
"question": "Drew knows that Earth is tilted on its axis. He also knows this tilt is responsible for the
season that a region on Earth will experience. When the South Pole is tilted toward the Sun, what
season will it be in Florida?”
"answer": “winter"
"hypothesis": "it is winter in florida”
"proof": "sent4 & sent5 -> int1: it is summer in south pole; int1 & sent6 -> int2: it is summer in
southern hemisphere; int2 & sent3 -> int3: it is winter in the northern hemisphere; sent1 & sent2 ->
int4: florida is located in the northern hemisphere; int3 & int4 -> hypothesis

StrategyQA

2780 open domain boolean reasoning QA.  Each example in StrategyQA is annotated with a
decomposition into reasoning steps for answering it, and Wikipedia paragraphs that provide evidence
for the answer to each step.
Example:
“Can a honey bee sting a human more than once?”
Answer: NoExplanation: Human skin is tough, and the bee’s stinger gets lodged in the skin. The
stinger becomes separated from the bee which dies soon after.

HotpotQA/BeerQA

113k Wikipedia-based question-answer pairs with sentence level supporting facts required to
reach the answer. Not really sure whether this is an explanation type dataset. Supporting
paragraphs can be too long. Answers have to be deduced after reading multiple Wiki
paragraphs.
Example:
"answers": ["Azriel Pe1ez"],
"question": "What professional boxer has a father who's also a boxer and a former circus performer?”
“context”:
”Jorge Pe1ez", "Jorge Adolfo Pe1ez (born October 27, 1965) is a Mexican actor, circus performer and
former professional boxer. In boxing he held the WBO and IBF featherweight titles. Paez's nickname
of \"\"El Maromero\"\" is in honor of the somersault (referred to in Spanish as \"\"maroma\"\") acts he
performs at the circus. It was in the circus that he learned acrobatic moves he would later use in the
boxing ring. Pe1ez is also the father of Azriel P\u00e1ez, Jorge Pe1ez Jr., and Airam Pe1ez."
"Azriel Pe1ez", "Azriel Pe1ez (born May 28, 1989 in San Luis Rio Colorado, Sonora, Mexico) is a
Mexican professional boxer in the Welterweight division. His father is the former WBO, IBF
Champion Jorge P\u00e1ez and his brother Jorge Pe1ez, Jr. is the current WBC Youth
Intercontinental Welterweight Champion."

You might also like