Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

VIDEO SUPERFRESH RELEVANCE JUDGEMENT GUIDELINES

Contents
Cheat Sheet: Based on analysis of common judge errors seen in over 5000 judgments in 8 languages
(the most common traps to avoid as a judge!!) ........................................................................................... 3
OVERVIEW ................................................................................................................................................... 6
UNDERSTANDING QUERY / USER INTENT ............................................................................................ 6
UNDERSTAND/EVALUATE THE QUERY AND THE RESULTS .................................................................. 6
The two most important things that you can do to be a successful judge:.................................................. 6
- Understand the most likely intent for the query. What different kinds of video
results is the user looking for? .......................................................................................................... 6
- Understand the types of video results for the query and sort them into buckets of
different levels of usefulness. ........................................................................................................ 6
Important Final notes: ............................................................................................................................ 7
VIDEO RELEVANCE RATING SCALE .......................................................................................................... 8
Specific Rules for Specific Query Segments ............................................................................................ 9
Segment: Movie Titles, Movie Franchise Queries ............................................................................... 9
Segment: Television Show Queries .................................................................................................... 10
Segment: Musicians and Band Name Queries.................................................................................. 10
Segment: Song Title Queries ............................................................................................................... 10
Segment: Sports Queries ..................................................................................................................... 11
LANGUAGE JUDGEMENT ......................................................................................................................... 11

4.1.2022 Update: Alert: you will occasionally be shown pages where the video result publication date is
unavailable and answer to the freshness question is not pre-loaded into the hitapp. For such cases you
are expected to do the research (by inspecting the inline loaded video result page, or navigating to the
page on a new tab), and enter the correct response. Users answering the freshness label question without
proper research will be banned from the hitapp

25.02.2022. Update: For videos that are geo-restricted from playing in your region, don’t blindly rate
them as “videodidntplay”. Try to judge based on the metadata and other information of the video, like
subscriber counts, source channel name, video views, etc and rate them accordingly
(highlyrelevant/relevant/related/notrelated etc instead of blindly choosing ‘videodidntplay’). See details
here
Note: This task and guideline is nearly identical to the video relevance judgment task. The main difference
is that there will be an additional question about the FRESHNESS (the publication date) of the video. The
answer to the FRESHNESS question will be auto-filled based on the video publication date in most of the
cases. In a few rare cases, the answer to the FRESHNESS question would be empty.

- If the FRESHNESS answer is auto-filled, you just need to verify that it’s correct by checking the
publication date mentioned on the video. If the auto-filled answer is wrong, you need to change it.

- For the rare cases (<10%) where video publication date is missing and freshness answer is not auto-
filled, you need to visit the page, and find out the video publication date and answer the question
on FRESHNESS.

Warning! Since many video results on this hitapp would be hosted on pages from news sites, you need
to pay careful attention to the video on the web page and judge the relevance of that video to the
query. If the web page is pertinent to the query but the video is on a totally different topic, the hit
should be demoted accordingly. See example for Not Related and Video didn’t play.

Warning: Answering the freshness question correctly is critical. For more than 90% of the cases, the
freshness label is auto-populated and you just need to do a sanity check. For rare cases, you may have
to investigate and label. See guideline below.

Video Video pub date Freshness Label


Url
Link - April 12 or Newer
- April 10 to April 11
- April 6 to April 9
- Mar 13 or to April 5
- Older than mar 13
link - April 12 or Newer
- April 10 to April 11
- April 6 to April 9
- Mar 13 or to April 5
- Older than mar 13
link - April 12 or Newer
- April 10 to April 11
- April 6 to April 9
- Mar 13 or to April 5
- Older than mar 13
Link - April 12 or Newer
- April 10 to April 11
- April 6 to April 9
- Mar 13 or to April 5
- Older than mar 13
link - April 12 or Newer
- April 10 to April 11
- April 6 to April 9
- Mar 13 or to April 5
- Older than mar 13
link - April 12 or Newer
- April 10 to April 11
- April 6 to April 9
- Mar 13 or to April 5
- Older than mar 13

Cheat Sheet: Based on analysis of common judge errors seen in over 5000
judgments in 8 languages (the most common traps to avoid as a judge!!)
- Beware of cases where the video on page is unrelated to the article. Pages on websites and news
sites will have articles containing a video, where the article is related to the query but the video
is totally unrelated. Such cases should be properly demoted.
- Understand query intent before judging: Especially for broad queries, ambiguous queries, or
vague queries, and for all queries in general. Understanding user intent and the different types of
useful results is key to making accurate judgments.
- Language question can be tricky and it needs to be judged carefully (please read the guideline
here, and listen to the audio of the video before judging).
- Queries with navigational intent (e.g. “youtube”, “nytimes”, “facebook”) are tricky to judge –
please see guidelines examples below to judge correctly).
- Don’t reach a conclusion about the relevance rating merely by seeing the video title
o Video can occasionally be highly relevant or relevant even though the keywords don’t
match the query. Understanding query intent and skimming through the video for a few
seconds can be helpful here.
o Conversely, there are also cases where the video title matches the query, but video is a
spam video, or is devoid of any meaningful content (often shorter than 20-30 seconds). A
video lacking any meaningful content should be demoted to related or not-related, even
if the keywords match the query.
- Location specific relevance: For some queries, the location impacts the meaning of the query and
the expected results (details here). For e.g. for query “crepe cafe gufo” by a user in Japan – the
users are looking for content about the Japanese Café (in Hiroshima), and results about the “Crepe
Café” in Ranchi, India should be considered irrelevant for users in Japan.
- Some specific rules for music, movie, Tvshow, sports etc: For Singer Name queries (e.g. “taylor
swift”), users’ highest preference is to see official music videos of their songs, and for sports team
/ sports player queries (e.g. “messi”, “barcelona”) the highest preference is to see recaps/sports
footage. Judge the video types accordingly.
OVERVIEW
You will be shown video search queries, and video result urls. Your task is to label how satisfied users would
be with those video results for the query. To provide correct answers, you need to deeply understand user
intent, as well as watch some portion of the video. Your judgments will be used to understand how well
our search engine is meeting user needs.

You cannot make accurate judgements solely based on the title of the page / video, as there are
many cases where the video is not related to the page content. Please ensure that you do proper due
diligence when labeling or you may be banned from the hitapp and/or UHRS.
For each query/result pair, you will have to answer two questions:
1. How relevant is the video to the user’s query?
• Is it Highly Relevant, Relevant, Related, or Not Related?
2. Does the language of the video degrade the usefulness of the results?
o Does the language of the video match what the user would expect? Or is it in a
secondary (less useful language), or is it in a totally unknown language for the user?

UNDERSTANDING QUERY / USER INTENT


Assume that the user is searching for videos – i.e. the query was typed in YouTube or Bing Video
/Google Videos.
For e.g. for query “python” on Pinterest, most users are looking for images of snakes. Whereas, for
“python” on Youtube, most users are looking for the programming language (use Google Trends in the
query locale, do your research on Bing and Google video/ Youtube to understand user intent). Update: A
brief description about the user intent will be given next to the query. You can use it to understand what
kind of results are useful.

UNDERSTAND/EVALUATE THE QUERY AND THE RESULTS


It’s critical to understand query intent before judging relevance. Some notes below:
• Queries may contain acronyms. Understand them.
o Example: For query “wow” the result “world of warcraft” might seem Not Related
to query intent. However, on searching Google/Bing/YouTube it’s clear that many people
issue this query to see World of Warcraft videos.
• Queries may be in a foreign language (in rare cases). Should you encounter a foreign
language query we expect you to try and provide a well-reasoned judgement. You are not being
asked to guess the relevance label (please don’t guess) but are being asked to apply some
thought before giving up and labeling as Cannot Judge.
o Important Note: Most commonly you will see foreign queries of the pattern
proper noun of a foreign language entity like “Y Tu Mamá También”, “La Vie en rose”, or
“Akihabara”. For these, we expect you to properly label the results.

The two most important things that you can do to be a successful judge:
- Understand the most likely intent for the query. What different kinds of video results is the user
looking for?
- Understand the types of video results for the query and sort them into buckets of different
levels of usefulness.
o Example: For query “Never gonna give you up by Rick Astley”
o the best possible result will be the song’s official music video– this can be “Highly
Relevant”.
o The next most useful results are videos of this song that aren’t the official music video/
or videos containing a portion of the song – these can be “Relevant”.
o Videos that discuss the song, without having the song audio, are the next most likely
intent – these can be “Related” as they are not about the main intents for the query,
but might still be of interest to a user.
o Nearly all other results for this query will be Not Related, as they will either have
nothing to do with the query.
As illustrated in above example, you should conceptualize the kinds of results which exist for a query, weigh
them in terms of overall usefulness, and then fit them into the rating scale provided.
Important Final notes:
• Some queries will have many Highly Relevant results, and some queries won’t.
• For song name queries, there will probably be one or two “best in class” /
HighlyRelevant videos.
• Some queries may be broad/vague/ ambiguous. Here’s a checklist for them:
• Broad queries: (e.g. “China”, “tiger”, “flamingo”, “yellowstone national park” -→
such queries need very comprehensive results to be treated highlyrelevant. Use
highlyrelevant for broad queries only if results are sufficiently
comprehensive. Else use a lower rating.
• Ambiguous queries: “black panther”, “sherlock holmes”, “Apple” → for such
queries, understand the major intent and rate results accordingly. Videos perfectly
satisfying minor intent (e.g. Apple as a fruit for “apple”) can be relevant at best.
Videos perfectly satisfying Rare intent (e.g Apple as name of a person for “apple”)
can be related at best.
• Vague queries: (e.g. “its”, “cal”, “am”, “church door”) → for vague queries, you
are suggested to avoid extreme ratings.
• Impact of region and location on the query intent: The query’s region and location
impact the major intent of the query in a couple of ways:
• It can change the meaning of the query: For e.g. users searching for “football” in
en-GB are looking for soccer results, while in en-US they are looking for NFL /
American football results. Users searching for “chat” in fr-FR are looking for cats
and not chatrooms.
• It can add a new interpretation for implicit regional and implicit local queries:
E.g. users searching for “coronavirus” in en-IN are looking for news that would be
relevant for Indian users. Users searching for “how to file tax returns” in en-GB are
looking for content that would be relevant in en-GB. Users searching for “tamarind
restaurant” in Dubai are looking for videos about the Dubai restaurant and not the
restaurant in Mumbai.
• Judgment for queries with full and partly navigational intent:
• Fully navigational intent: Queries like “nytimes”, “espn”, “foxnews videos” have
fully navigational intent. For such queries, any video from the corresponding site
or youtube/facebook channel can be considered highly relevant.
• Partly navigational intent: Queries like “youtube”, “facebook”, “dailymotion”,
“instagram”, “twitter” have two or three types of intents, of which the navigational
intent is one of the important intents. The navigational intent should be judged as
per the language and location settings, i.e. a user making navigational query for
“facebook” in USA is expecting to see English language content from Facebook
relevant to American users, and a user making navigational query “youtube” in
France/Germany/India is expecting to see locally relevant content.
VIDEO RELEVANCE RATING SCALE
Below is our guideline for the meaning of the Relevance Ratings.
Important Note: Below guidance applies for most types of queries, but there are some query
segments with more specific rules. The guidelines for specific segments will immediately follow the
rating scale definition.
Relevance Rating Explanation
Highly Relevant • Video answers query 100%
• Video is categorically among best video results for query in terms of meeting
user intent.
• There are no significantly better videos for answering the user’s intent.
• Does not offer anything additional that would broaden or narrow the query or
lessen user satisfaction.
• Video is a “best in class” answer to the query.
Relevant • Video matches the intended subject however it contains additional less-useful
content detracting from user intent.
• Part of video (not entire video), matches intended subject.
o Query is for opening scene of a movie and result is full movie.
• Video is partial/incomplete version of intended subject.
o Query is for a full movie and video is showing part 2 of 3 of movie.
• Video meets/ fully satisfies a minor interpretation of query (such results can at
best be labeled relevant)
o Query is "Apple” and the video is discussing the fruit.
Related • Video fully satisfies a rare interpretation of query (such results can at best, be
labeled related)
o e.g., Query is “Amazon” and result is about Amazon warrior women.
• Video doesn’t match query’s intent but contains related content that may be of
interest to user.
o e.g., Query is looking for "how to change oil and oil filter" and result is
"how to change oil pan gasket."
• Query asks for specific content from a specific source and result is correct
content but from wrong source.
o e.g., Query is "cartoons on YouTube" and the result is "cartoons"
on dailymotion.com.
• Correct subject, but some qualifiers in query not matched (such as date, video
type etc.)
o e.g., Query asks for a concert at a specific location/date and result
matches the location, but day is wrong by 1-2 days.
• Satire or parody of intended subject when such parody is not explicitly
requested.
o e.g., Query asks for a celebrity and the result is meme of that celebrity.
• Video is an unexpected slideshow with audio of the intended subject.
o e.g., Query is "winter”, and the result is a slide show of winter scenes
(do note that slideshows may be useful for education and other topics and
useful slideshows shouldn’t be downgraded)
• Video is a static image with audio of intended subject.
o e.g., Query is "cat”, and the result is a static image of a cat with audio
of a cat meowing
Not Related • Video is unrelated to query intent.
• Page is related to the query intent topic, but video on the page is unrelated to
query intent. (e.g. query is “hunter biden emails” and the result is this MSN article
containing a video about Rudi Giuliani’s email probe – The video about Rudi Giuliani
should be judged as NOT RELATED).
• Result is a spam video.
o e.g., the video says it is a movie, but rather than showing the correct
content the uploader tries to take you to another website to view the
content.
Detrimental This label indicates illegal or adult video content. A video should be rated Detrimental if any of
following applies:
- X-rated content: Full frontal nudity, genitals, women’s nipples, sexual intercourse,
pornography.
- Real footage of extreme graphic violence unsuitable for broadcast news. (e.g., a
beheading)
- Child Porn.
NOTE: IF a query has adult intent the video should be rated on relevance. Detrimental should
only be used when the query does not have adult intent, but the video contains adult content.
Video Didn’t Play • Video does not load, either on page or by clicking URL
• Page contains image(s) thumbnails of video players, but no actual video player.
• Page does not contain a video (page might contain an article relevant to a
query, but if it doesn’t have a video, rate it as video didn’t play).
• Page contains an audio file without video.
It is possible that videos of Live Events don’t play after the event is over. In such a case, use the
other indicators like video title, source (YouTube # of subscribers, source name etc) to decide
whether the live video was accessible at the time of search. If yes, and the video was relevant to
the live event/topic then label the video as “Highly Relevant” instead of “Video Didn’t play”.
Cannot Judge Video is in a foreign language and you cannot provide a well-reasoned judgement.
The query is in a language you don’t understand, and the translation of the query doesn’t make
sense.
Do not mark English queries as foreign.

Specific Rules for Specific Query Segments


Below are the guidelines for several specific query segments. These guidelines supersede the general rating
scale provided above but should mostly align with the general rating scale.
Segment: Movie Titles, Movie Franchise Queries
(e.g. “avengers”, “avatar”, “terminator judgment day”, “Joker movie”, “gladiator”)

Type of Video Highly Relevant Relevant Related Not Related


Complete Movie X
Official Trailer / Commercial X
Portion of complete movie X
(e.g., Part 1/3)
Movie review X
Cast interview related to the movie X
Behind the scenes, making of, outtakes X
from the movie’s production
Fan made content (including fan made X
movie trailers)
Wrong movie X

Segment: Television Show Queries


(e.g. “Game of thrones”, “breaking bad”, “Indian Idol”)
Type of Video Highly Relevant Related Not Related
Relevant
Any Complete Episode X
Clip of an entire segment from the TV X
Show
Official trailer, commercial, or preview of X
upcoming episode
Review of TV show X
Cast interview related to the production X
of the TV show
Any fan made content related to X
the show
Wrong show X

Segment: Musicians and Band Name Queries


(e.g. “taylor swift”, “Michael jackson”, “the beatles”)
Type of Video Highly Relevant Related Not Related
Relevant
Official music video X
where musician/band is the
primary focus
Official music video where artist is X
featured, but is not the
primary song writer
Live performance (when not an official X
music video)
Audio of musician with static image, X
slide show, or lyrics from an unofficial
source
Audio of a whole album or multiple X
songs
Video only contains only a portion of a X
song by the musician/band
Remix of correct musician by a different X
artist

Segment: Song Title Queries


(e.g. “taylor swift lover”, “hello”, “give in to me”)
Type of Video Highly Relevant Related Not Related
Relevant
Any video of the song from official X
source
Live performance of the song X
Audio of the song with static image, X
slideshow, or lyrics
Remix of the song by original artist X
Cover of the song by an amateur X
musician
Partial song audio X
Video discussing the song X
Correct musician, wrong song X

Segment: Sports Queries


This should be interpreted as referring to queries for: sport names, sports team names, sports matches,
sports leagues, and organizations. For all other sports related queries, please follow the guidelines under
the Relevance Rating Scale.
(e.g. “barcelona”, “liverpool”, “real madrid vs Atletico madrid”, “nba”, “nfl”, “premier league”)
Type of Video Highly Relevant Relevant Related Not Related
Complete game, professional level X
highlights, or professional game recaps,
and analysis by top level experts
Postgame interviews X
with players and press conferences
Promotional videos for the team or for a X
specific match
Commentary from fans and other fan X
made content
Results from a video game X

LANGUAGE JUDGEMENT
Please listen to the audio of the video for a few seconds to answer this question correctly.
The goal is to ascertain whether the video language (audio/title and captions) matches the language
expected by user. You need to listen to the audio to answer this correctly. A user from UK issuing queries
like “iphone review”, “biden inauguration”, “things to do in Amsterdam” would expect English language
results. Hindi results for such queries would not be useful for this scenario (Hindi isn’t widely spoken in the
UK).
List of labels and their definitions:
• Does not degrade: use this when query language matches the result
• The query is in English and the video has English audio/text (listen to the audio
to label correctly)
• The query is for a foreign language, but the video has appropriate subtitles that
make the video understood by someone who speaks the language of the hitapp.
• e.g. Query is "Amelie film" (a French film) and the video has audio in French
but subtitles in English
• Query is for foreign language song, singer, band, tv show, news site,
Youtube channel.
o For “despacito”, user expects the results to be the Spanish language song.
o Query for Korean TV show is expected to show results with Korean audio
and preferably English subtitles.
o Query for German site “prosieben” expects videos in german, query for
“rebeka wing” (german youtuber) expects German results.
o For query “extradicion del el mencho” in USA, results are expected in
Spanish.
• There is no audio/audio is not needed to understand the video.
o e.g. Query is "Primitive Technology" (a channel that uploads videos
without dialogue /subtitles, but neither of these are needed to understand the
content) and result is from this channel.
o e.g. Query is "Mimes" and result is a mime routine.
o e.g. Query is for sports footage and result has correct content but foreign
language commentary.
• Partially degrades: Use this when result doesn’t match the query language but is in a
secondary language of the region:
o For e.g. if the query was in Spanish in USA, and the result was in English.
o If the query was in Spanish in Spain, (or Italian in Italy) and the result was in English.
English can be considered as secondary language in most regions.
• Severely degrades: Use this label when the wrong language makes the results useless,
because that language is not commonly spoken in the region. E.g. for an English query in en-
US “harden stepback review” if the results have audio in Indonesian (which is not spoken in
USA).

Guideline examples for navigational queries


Query (region) Video URL, screenshot, Judgment
Youtube (en-US). Video url (link) (link)

User intent is to
view popular
videos from
Youtube.com that
are useful/popular
among USA users.

Relevance Rating: Highly relevant. These results meet the requirement of being
from Youtube.com, and are useful/popular among USA users. Language doesn’t
degrade.
Youtube (en-US) Video url (link) (link) (link)

User intent is to
view videos from
Youtube.com that
are useful/popular
among USA users.

Relevance Rating: Related. These videos meet the requirement of being from
youtube.com, but they aren’t popular or useful among USA users. In fact these
videos are from very random channels and unknown content creators. Hence the
rating should not be higher than related.
Facebook (en-US) Video url (link) (link)
User intent can be
(a) to view videos
from
facebook.com that
are useful/popular
among USA users.
Or (b) view news &
info/how to videos Relevance Rating: Highly relevant. These results meet the requirement of being
about Facebook from Facebook.com, and are useful/popular among USA users. Language doesn’t
the company or its degrade.
products.

Facebook (en-US) Video url (link) video url


User intent can be
(a) to view videos
from
facebook.com that
are useful/popular
among USA users.
Or (b) view news &
info/how to videos
about Facebook Relevance Rating: Related. While these results meet the requirement of being
the company or its from Facebook.com, they aren’t popular/useful for USA users. And they are random
products. videos from obscure content creators, hence, they should be labed no higher than
related.
Facebook (en-US) Video url (link)
User intent can be
(a) to view videos
from
facebook.com that
are useful/popular
Video url (link)
among USA users.
Or (b) view news &
info/how to videos
about Facebook
the company or its
products.
Relevance rating: Highly relevant. These results meet the requirement of
“news/information or how to videos” about Facebook the company or
product/Service. Hence, these are Highly Relevant.
Facebook (en-GB) Video Url (link)
User intent can be
(a) to view videos
from
facebook.com that
are useful/popular
among GB users.
Or (b) view news &
Language rating: Severely degrades, as the video in Spanish are not useful in UK.
info/how to videos
about Facebook
the company or its
products.
Nytimes (en-US) Video url (link)

User intent is to
view videos from
new york times
(website or
youtube/facebook
channel).

Video url (link)

Relevance rating: highly relevant (as these videos meet the requirement of being
from nytimes.com or the new York times channel on youtube/facebook).

How to judge videos that are geographically restricted in your region,


when the query region doesn’t match with your region as a judge
Query details Screenshot, result url, judgment
Query: Saturday url: https://www.youtube.com/watch?v=z6YzhQt0v28
Night Live (the uploader has not made the video available “IN YOUR COUNTRY”)
Query region:
en-US
Judge region: IN
(assume judge
based in India)
Result url :

this video is not playable in India, but it can be assumed that it’s playable in
USA. Further, based on metadata like channel name (official Saturday Night Live
channel, subscriber count 12 mn, view count 180K and likes 3K) it can be
inferred that this is a good video. It can be rated highlyrelevant instead of
videodidntplay.
Final judgment: highlyrelevant AND NOT videodidntplay
Correct answer: highlyrelevant
Barca vs Napoli url: https://www.youtube.com/watch?v=qq7MSjXHRTU
champions (the uploader has not made the video available “IN YOUR COUNTRY”)
league

this video is not playable in India, but it can be assumed that it’s playable in
USA. Further, based on metadata like channel name (official CBS Sports channel,
subscriber count 450 K, view count 830K and likes 7.7K) it can be inferred that
this is a good video. It can be rated highlyrelevant instead of videodidntplay
Final judgment: highlyrelevant AND NOT videodidntplay
Correct answer: highlyrelevant
Champions url: https://www.youtube.com/watch?v=enSw17A9-Eo
league goals Based on channel and video metadata (likes, subs count, view count) this can be
inferred to be a good video

Final judgment: highlyrelevant AND NOT videodidntplay


Correct answer: highlyrelevant

You might also like