Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

VIDEO SUPERFRERELEVANCE JUDGEMENT GUIDELINES

Contents
Cheat Sheet: Based on analysis of common judge errors seen in over 5000 judgments in 8 languages
(the most common traps to avoid as a judge!!) ........................................................................................... 2
OVERVIEW ................................................................................................................................................... 5
UNDERSTANDING QUERY / USER INTENT ............................................................................................ 5
UNDERSTAND/EVALUATE THE QUERY AND THE RESULTS .................................................................. 5
The two most important things that you can do to be a successful judge:.................................................. 5
- Understand the most likely intent for the query. What different kinds of video
results is the user looking for? .......................................................................................................... 5
- Understand the types of video results for the query and sort them into buckets of
different levels of usefulness. ........................................................................................................ 5
Important Final notes: ............................................................................................................................ 6
VIDEO RELEVANCE RATING SCALE .......................................................................................................... 7
Specific Rules for Specific Query Segments ............................................................................................ 8
Segment: Movie Titles, Movie Franchise Queries ............................................................................... 8
Segment: Television Show Queries ...................................................................................................... 9
Segment: Musicians and Band Name Queries.................................................................................... 9
Segment: Song Title Queries ................................................................................................................. 9
Segment: Sports Queries ..................................................................................................................... 10
LANGUAGE JUDGEMENT ......................................................................................................................... 10
Language judgment examples .................................................................................................................... 11

Note: This task and guideline is nearly identical to the video relevance judgment task. The main difference
is that there will be an additional question about the FRESHNESS (the publication date) of the video. The
answer to the FRESHNESS question will be auto-filled based on the video publication date in most of the
cases. In a few rare cases, the publication date would be missing, and the answer to the FRESHNESS question
would be empty.

- If the FRESHNESS answer is auto-filled, you just need to verify that it’s correct by checking the
publication date mentioned on the video. If the auto-filled answer is wrong, you need to change it.

- For the rare cases (<10%) where pub date is missing and freshness answer is not auto-filled, you
need to visit the page, and find out the video publication date and answer the question on
FRESHNESS.
Warning! Since many video results on this hitapp would be hosted on pages from news sites, you need
to pay careful attention to the video on the web page and judge the relevance of that video to the
query. If the web page is pertinent to the query but the video is on a totally different topic, the hit
should be demoted accordingly. See example for Not Related and Video didn’t play.

Warning: Answering the freshness question correctly is critical. For more than 90% of the cases, the
freshness label is auto-populated and you just need to do a sanity check. For rare cases, you may have
to investigate and label. See guideline below.

Video Video pub date Freshness Label


Url
Link - April 12 or Newer
- April 10 to April 11
- April 6 to April 9
- Mar 13 or to April 5
- Older than mar 13
link - April 12 or Newer
- April 10 to April 11
- April 6 to April 9
- Mar 13 or to April 5
- Older than mar 13
link - April 12 or Newer
- April 10 to April 11
- April 6 to April 9
- Mar 13 or to April 5
- Older than mar 13
Link - April 12 or Newer
- April 10 to April 11
- April 6 to April 9
- Mar 13 or to April 5
- Older than mar 13
link - April 12 or Newer
- April 10 to April 11
- April 6 to April 9
- Mar 13 or to April 5
- Older than mar 13
link - April 12 or Newer
- April 10 to April 11
- April 6 to April 9
- Mar 13 or to April 5
- Older than mar 13
Cheat Sheet: Based on analysis of common judge errors seen in over 5000
judgments in 8 languages (the most common traps to avoid as a judge!!)
- Understand query intent before judging: Especially for broad queries, ambiguous queries, or
vague queries, and for all queries in general. Understanding user intent and the different types of
useful results is key to making accurate judgments.
- Language question can be tricky and it needs to be judged carefully (please read the guideline
here, and see the examples here, and listen to the audio of the video before judging).
- Queries with navigational intent (e.g. “youtube”, “nytimes”, “facebook”) are tricky to judge –
please see guidelines examples below to judge correctly).
- Don’t reach a conclusion about the relevance rating merely by seeing the video title
o Video can occasionally be highly relevant or relevant even though the keywords don’t
match the query. Understanding query intent and skimming through the video for a few
seconds can be helpful here.
o Conversely, there are also cases where the video title matches the query, but video is a
spam video, or is devoid of any meaningful content (often shorter than 20-30 seconds). A
video lacking any meaningful content should be demoted to related at best, even if the
keywords match the query.
- Beware of cases where the video on page is unrelated to the article. Pages on websites and news
sites will have articles containing a video, where the article is related to the query but the video
is totally unrelated. Such cases should be properly demoted.
- Location specific relevance: For some queries, the location impacts the meaning of the query and
the expected results (details here). For e.g. for query “crepe cafe gufo” by a user in Japan – the
users are looking for content about the Japanese Café (in Hiroshima), and results about the “Crepe
Café” in Ranchi, India should be considered irrelevant for users in Japan.
- Some specific rules for music, movie, Tvshow, sports etc: For Singer Name queries (e.g. “taylor
swift”), users’ highest preference is to see official music videos of their songs, and for sports team
/ sports player queries (e.g. “messi”, “barcelona”) the highest preference is to see recaps/sports
footage. Judge the video types accordingly.
OVERVIEW
You will be shown video search queries, and video result urls. Your task is to label how satisfied users would
be with those video results for the query. To provide correct answers, you need to deeply understand user
intent, as well as watch some portion of the video. Your judgments will be used to understand how well
our search engine is meeting user needs.

You cannot make accurate judgements solely based on the title of the page / video, as there are
many cases where the video is not related to the page content. Please ensure that you do proper due
diligence when labeling or you may be banned from the hitapp and/or UHRS.
For each query/result pair, you will have to answer two questions:
1. How relevant is the video to the user’s query?
• Is it Highly Relevant, Relevant, Related, or Not Related?
2. Does the language of the video degrade the usefulness of the results?
o Does the language of the video match what the user would expect? Or is it in a
secondary (less useful language), or is it in a totally unknown language for the user?

UNDERSTANDING QUERY / USER INTENT


Assume that the user is searching for videos – i.e. the query was typed in YouTube or Bing Video
/Google Videos.
For e.g. for query “python” on Pinterest, most users are looking for images of snakes. Whereas, for
“python” on Youtube, most users are looking for the programming language (use Google Trends in the
query locale, do your research on Bing and Google video/ Youtube to understand user intent). Update: A
brief description about the user intent will be given next to the query. You can use it to understand what
kind of results are useful.

UNDERSTAND/EVALUATE THE QUERY AND THE RESULTS


It’s critical to understand query intent before judging relevance. Some notes below:
• Queries may contain acronyms. Understand them.
o Example: For query “wow” the result “world of warcraft” might seem Not Related
to query intent. However, on searching Google/Bing/YouTube it’s clear that many people
issue this query to see World of Warcraft videos.
• Queries may be in a foreign language (in rare cases). Should you encounter a foreign
language query we expect you to try and provide a well-reasoned judgement. You are not being
asked to guess the relevance label (please don’t guess) but are being asked to apply some
thought before giving up and labeling as Cannot Judge.
o Important Note: Most commonly you will see foreign queries of the pattern
proper noun of a foreign language entity like “Y Tu Mamá También”, “La Vie en rose”, or
“Akihabara”. For these, we expect you to properly label the results.

The two most important things that you can do to be a successful judge:
- Understand the most likely intent for the query. What different kinds of video results is the user
looking for?
- Understand the types of video results for the query and sort them into buckets of different
levels of usefulness.
o Example: For query “Never gonna give you up by Rick Astley”
o the best possible result will be the song’s official music video– this can be “Highly
Relevant”.
o The next most useful results are videos of this song that aren’t the official music video/
or videos containing a portion of the song – these can be “Relevant”.
o Videos that discuss the song, without having the song audio, are the next most likely
intent – these can be “Related” as they are not about the main intents for the query,
but might still be of interest to a user.
o Nearly all other results for this query will be Not Related, as they will either have
nothing to do with the query.
As illustrated in above example, you should conceptualize the kinds of results which exist for a query, weigh
them in terms of overall usefulness, and then fit them into the rating scale provided.
Important Final notes:
• Some queries will have many Highly Relevant results, and some queries won’t.
• For song name queries, there will probably be one or two “best in class” /
HighlyRelevant videos.
• Some queries may be broad/vague/ ambiguous. Here’s a checklist for them:
• Broad queries: (e.g. “China”, “tiger”, “flamingo”, “yellowstone national park” -→
such queries need very comprehensive results to be treated highlyrelevant. Use
highlyrelevant for broad queries only if results are sufficiently
comprehensive. Else use a lower rating.
• Ambiguous queries: “black panther”, “sherlock holmes”, “Apple” → for such
queries, understand the major intent and rate results accordingly. Videos perfectly
satisfying minor intent (e.g. Apple as a fruit for “apple”) can be relevant at best.
Videos perfectly satisfying Rare intent (e.g Apple as name of a person for “apple”)
can be related at best.
• Vague queries: (e.g. “its”, “cal”, “am”, “church door”) → for vague queries, you
are suggested to avoid extreme ratings.
• Impact of region and location on the query intent: The query’s region and location
impact the major intent of the query in a couple of ways:
• It can change the meaning of the query: For e.g. users searching for “football” in
en-GB are looking for soccer results, while in en-US they are looking for NFL /
American football results. Users searching for “chat” in fr-FR are looking for cats
and not chatrooms.
• It can add a new interpretation for implicit regional and implicit local queries:
E.g. users searching for “coronavirus” in en-IN are looking for news that would be
relevant for Indian users. Users searching for “how to file tax returns” in en-GB are
looking for content that would be relevant in en-GB. Users searching for “tamarind
restaurant” in Dubai are looking for videos about the Dubai restaurant and not the
restaurant in Mumbai.
• Judgment for queries with full and partly navigational intent:
• Fully navigational intent: Queries like “nytimes”, “espn”, “foxnews videos” have
fully navigational intent. For such queries, any video from the corresponding site
or youtube/facebook channel can be considered highly relevant.
• Partly navigational intent: Queries like “youtube”, “facebook”, “dailymotion”,
“instagram”, “twitter” have two or three types of intents, of which the navigational
intent is one of the important intents. The navigational intent should be judged as
per the language and location settings, i.e. a user making navigational query for
“facebook” in USA is expecting to see English language content from Facebook
relevant to American users, and a user making navigational query “youtube” in
France/Germany/India is expecting to see locally relevant content.
VIDEO RELEVANCE RATING SCALE
Below is our guideline for the meaning of the Relevance Ratings.
Important Note: Below guidance applies for most types of queries, but there are some query
segments with more specific rules. The guidelines for specific segments will immediately follow the
rating scale definition.
Relevance Rating Explanation
Highly Relevant • Video answers query 100%
• Video is categorically among best video results for query in terms of meeting
user intent.
• There are no significantly better videos for answering the user’s intent.
• Does not offer anything additional that would broaden or narrow the query or
lessen user satisfaction.
• Video is a “best in class” answer to the query.
Relevant • Video matches the intended subject however it contains additional less-useful
content detracting from user intent.
• Part of video (not entire video), matches intended subject.
o Query is for opening scene of a movie and result is full movie.
• Video is partial/incomplete version of intended subject.
o Query is for a full movie and video is showing part 2 of 3 of movie.
• Video meets/ fully satisfies a minor interpretation of query (such results can at
best be labeled relevant)
o Query is "Apple” and the video is discussing the fruit.
Related • Video fully satisfies a rare interpretation of query (such results can at best, be
labeled related)
o e.g., Query is “Amazon” and result is about Amazon warrior women.
• Video doesn’t match query’s intent but contains related content that may be of
interest to user.
o e.g., Query is looking for "how to change oil and oil filter" and result is
"how to change oil pan gasket."
• Query asks for specific content from a specific source and result is correct
content but from wrong source.
o e.g., Query is "cartoons on YouTube" and the result is "cartoons"
on dailymotion.com.
• Correct subject, but some qualifiers in query not matched (such as date, video
type etc.)
o e.g., Query asks for a concert at a specific location/date and result
matches the location, but day is wrong by 1-2 days.
• Satire or parody of intended subject when such parody is not explicitly
requested.
o e.g., Query asks for a celebrity and the result is meme of that celebrity.
• Video is an unexpected slideshow with audio of the intended subject.
o e.g., Query is "winter”, and the result is a slide show of winter scenes
(do note that slideshows may be useful for education and other topics and
useful slideshows shouldn’t be downgraded)
• Video is a static image with audio of intended subject.
o e.g., Query is "cat”, and the result is a static image of a cat with audio
of a cat meowing
Not Related • Video is unrelated to query intent.
• Page is related to the query intent topic, but video on the page is unrelated to
query intent. (e.g. query is “hunter biden emails” and the result is this MSN article
containing a video about Rudi Giuliani’s email probe – The video about Rudi Giuliani
should be judged as NOT RELATED).
• Result is a spam video.
o e.g., the video says it is a movie, but rather than showing the correct
content the uploader tries to take you to another website to view the
content.
Detrimental This label indicates illegal or adult video content. A video should be rated Detrimental if any of
following applies:
- X-rated content: Full frontal nudity, genitals, women’s nipples, sexual intercourse,
pornography.
- Real footage of extreme graphic violence unsuitable for broadcast news. (e.g., a
beheading)
- Child Porn.
NOTE: IF a query has adult intent the video should be rated on relevance. Detrimental should
only be used when the query does not have adult intent, but the video contains adult content.
Video Didn’t Play • Video does not load, either on page or by clicking URL
• Page contains image(s) thumbnails of video players, but no actual video player.
• Page does not contain a video (page might contain an article relevant to a
query, but if it doesn’t have a video, rate it as video didn’t play).
• Page contains an audio file without video.
It is possible that videos of Live Events don’t play after the event is over. In such a case, use the
other indicators like video title, source (YouTube # of subscribers, source name etc) to decide
whether the live video was accessible at the time of search. If yes, and the video was relevant to
the live event/topic then label the video as “Highly Relevant” instead of “Video Didn’t play”.
Cannot Judge Video is in a foreign language and you cannot provide a well-reasoned judgement.
The query is in a language you don’t understand, and the translation of the query doesn’t make
sense.
Do not mark English queries as foreign.

Specific Rules for Specific Query Segments


Below are the guidelines for several specific query segments. These guidelines supersede the general rating
scale provided above but should mostly align with the general rating scale.
Segment: Movie Titles, Movie Franchise Queries
(e.g. “avengers”, “avatar”, “terminator judgment day”, “Joker movie”, “gladiator”)

Type of Video Highly Relevant Relevant Related Not Related


Complete Movie X
Official Trailer / Commercial X
Portion of complete movie X
(e.g., Part 1/3)
Movie review X
Cast interview related to the movie X
Behind the scenes, making of, outtakes X
from the movie’s production
Fan made content (including fan made X
movie trailers)
Wrong movie X

Segment: Television Show Queries


(e.g. “Game of thrones”, “breaking bad”, “Indian Idol”)
Type of Video Highly Relevant Related Not Related
Relevant
Any Complete Episode X
Clip of an entire segment from the TV X
Show
Official trailer, commercial, or preview of X
upcoming episode
Review of TV show X
Cast interview related to the production X
of the TV show
Any fan made content related to X
the show
Wrong show X

Segment: Musicians and Band Name Queries


(e.g. “taylor swift”, “Michael jackson”, “the beatles”)
Type of Video Highly Relevant Related Not Related
Relevant
Official music video X
where musician/band is the
primary focus
Official music video where artist is X
featured, but is not the
primary song writer
Live performance (when not an official X
music video)
Audio of musician with static image, X
slide show, or lyrics from an unofficial
source
Audio of a whole album or multiple X
songs
Video only contains only a portion of a X
song by the musician/band
Remix of correct musician by a different X
artist

Segment: Song Title Queries


(e.g. “taylor swift lover”, “hello”, “give in to me”)
Type of Video Highly Relevant Related Not Related
Relevant
Any video of the song from official X
source
Live performance of the song X
Audio of the song with static image, X
slideshow, or lyrics
Remix of the song by original artist X
Cover of the song by an amateur X
musician
Partial song audio X
Video discussing the song X
Correct musician, wrong song X

Segment: Sports Queries


This should be interpreted as referring to queries for: sport names, sports team names, sports matches,
sports leagues, and organizations. For all other sports related queries, please follow the guidelines under
the Relevance Rating Scale.
(e.g. “barcelona”, “liverpool”, “real madrid vs Atletico madrid”, “nba”, “nfl”, “premier league”)
Type of Video Highly Relevant Relevant Related Not Related
Complete game, professional level X
highlights, or professional game recaps,
and analysis by top level experts
Postgame interviews X
with players and press conferences
Promotional videos for the team or for a X
specific match
Commentary from fans and other fan X
made content
Results from a video game X

LANGUAGE JUDGEMENT
Please listen to the audio of the video for a few seconds to answer this question correctly.
The goal is to ascertain whether the video language (audio/title and captions) matches the language
expected by user. You need to listen to the audio to answer this correctly. A user from UK issuing queries
like “iphone review”, “biden inauguration”, “things to do in Amsterdam” would expect English language
results. Hindi results for such queries would not be useful for this scenario (Hindi isn’t widely spoken in the
UK).
List of labels and their definitions:
• Does not degrade: use this when query language matches the result
• The query is in English and the video has English audio/text (listen to the audio
to label correctly)
• The query is for a foreign language, but the video has appropriate subtitles that
make the video understood by someone who speaks the language of the hitapp.
• e.g. Query is "Amelie film" (a French film) and the video has audio in French
but subtitles in English
• Query is for foreign language song, singer, band, tv show, news site,
Youtube channel.
o For “despacito”, user expects the results to be the Spanish language song.
o Query for Korean TV show is expected to show results with Korean audio
and preferably English subtitles.
o Query for German site “prosieben” expects videos in german, query for
“rebeka wing” (german youtuber) expects German results.
o For query “extradicion del el mencho” in USA, results are expected in
Spanish.
• There is no audio/audio is not needed to understand the video.
o e.g. Query is "Primitive Technology" (a channel that uploads videos
without dialogue /subtitles, but neither of these are needed to understand the
content) and result is from this channel.
o e.g. Query is "Mimes" and result is a mime routine.
o e.g. Query is for sports footage and result has correct content but foreign
language commentary.
• Partially degrades: Use this when result doesn’t match the query language but is in a
secondary language of the region:
o For e.g. if the query was in Spanish in USA, and the result was in English.
o If the query was in Spanish in Spain, (or Italian in Italy) and the result was in English.
English can be considered as secondary language in most regions.
• Severely degrades: Use this label when the wrong language makes the results useless,
because that language is not commonly spoken in the region. E.g. for an English query in en-
US “harden stepback review” if the results have audio in Indonesian (which is not spoken in
USA).

The process for language judgment: Try to do a quick side search on Google Web, Google
Videos, YouTube, Bing Web to understand what’s the preferred language of the results.

Language judgment examples


Query(user Sample SERP Results Video URL and Rating with explanation
locale)
Rebekah Sample Search results in France: Video url (link)
wing (fr-
FR)

Does not degrade (The French user explicitly asked


for german language youtube channel, hence “Does
not degrade”, even through German is not spoken
in France.)
Shakira Sample search results in Japan Video URL(link)
(ja-JP)

Does not degrade (If foreign language singer,


songs, movie, foreign TV show or website is explicitly
requested, then it’s okay for results to be in
corresponding foreign language. These results don’t
degrade.)
Iphone Sample search results in Japan. Video url (link)
(ja-JP)

Does not degrade. The Video result is in Japanese,


hence, does not degrade.
Iphone Sample search results in Japan. Video url (link)
(ja-JP)

Partially degrades. For universally popular entities


like “iphone”, the users in Japan expect Japanese
language news and informational videos. Results in
English (secondary language in Japan) partially
degrade.
Iphone Sample search results in Japan. Video results (link)
(ja-JP)

Severely degrades (The query is in Japan and the


results are in French, hence the language severely
degrades the quality).
Minecraft Sample search results in Germany Video result (link)
(de-DE)

Does not degrade (users expect German results for


this query and the link below is in German. So it does
not degrade)
Minecraft Refer to above Video result (link)
(de-DE)

Partially degrades (as seen from screenshot above,


the expected language of the results is German for
query “minecraft”. Since this result is in English
(secondary language in Germany), it partially
degrades.
Minecraft Refer to above Video result (link)
(de-DE)

Severely degrades The result is in Spanish for


“minecraft” in Germany. Hence, it severely degrades.
Among us Sample SERP results in Spain Video result (link)
(es-ES)
Does not degrade As seen from SERP alongside,
the expected results are in Spanish. The results do
not degrade.
Among us Refer above Video result (link).
(es-ES)

Partially degrades As seen from search results


above, the expected results are in Spanish. But the
current video is in English. Hence partially degrades.
Among us Refer above Video result (link)
(es-ES)

Severely degrades (since the results are in French


which is not spoken in Spain, the label is Severely
degrades).
Tom Sample SERP results in France Video result (link)
hanks (fr-
FrR)

Video result(link)
Does not degrade There’s ambiguity about the
language of the query – however, a look at the SERP
results indicates that the users prefer French results.
Hence, since the videos are in French, they don’t
degrade.
Tom Refer to above Video result (link)
hanks (fr-
FrR)

Partially degrades As the result is in English,


whereas the query prefers French results. The
English results should labeled “partially degrades”.
Netflix Sample SERP results in Germany Video URL: (link)
(de-DE)

Does not degrade (as you can see from Sample


SERP results in Germany, users prefer to see German
language content for this query. Hence as the result
is in German, it doesn’t degrade.
Netflix Refer to above Video URL (link)
(de-DE)

Partially degrades. Since German users are looking


for German content here, the results in English
partially degrade for this query..
Netflix Refer to above Video URL (link)
(de-DE)

Severely degrades, Since this is a query in Germany


and results are in Spanish (not commonly spoken in
Germany)
Amazon Video url (link)
Prime
Video (fr-
FR)
Amazon Video url (link)
Prime
video (fr-
FR)

Partially degrades (as the users are expecting


French content, and our results are in English),
Amazon Video url (link)
Prime (fr-
FR)

Severely degrades (as the users are looking for


French language, and the result is in German.
El Chapo Sample SERP results in GB Video url (link)
(en-GB)

Severely degrades (language of the video is


Spanish which is not broadly spoken in UK, so it
severely degrades)

Examples for navigational queries


Query (region) Video URL, screenshot, Judgment
Youtube (ja-JP). Video url (link) (link)

User intent is to
view videos from
Youtube.com that
are useful/popular
among Japanese
users.

Relevance Rating: Highly relevant. These results meet the requirement of being
from Youtube.com, and are useful/popular among Japanese users. Language
doesn’t degrade.
Youtube (es-ES) Video url (link) (link) (link)

User intent is to
view videos from
Youtube.com that
are useful/popular
among Spanish
users.

Relevance rating: Related. These videos meet the requirement of being from
youtube.com and being in Spanish, but they aren’t popular or useful among
Spanish users. In fact these videos are from very random channels, and unknown
content creators. Hence the rating should not be higher than related. Language
doesn’t degrade.
Facebook (fr-FR) Video url (link) (link)
User intent can be
(a) to view videos
from
facebook.com that
are useful/popular
among French
users.
Or (b) view news &
info/how to videos
Relevance Rating: Highly relevant. These results meet the requirement of being
about Facebook
from Facebook.com, and are useful/popular among French users. Language
the company or its
doesn’t degrade.
products.
Facebook (en-US) Video url (link) video url
User intent can be
(a) to view videos
from
facebook.com that
are useful/popular
among USA users.
Or (b) view news &
info/how to videos
about Facebook
the company or its Relevance Rating: Related. While these results meet the requirement of being
products. from Facebook.com, they aren’t popular/useful for USA users. In fact they come
from random and obscure content creators, and for this reason, they should be
treated as “related”.
Facebook (en-GB) Video url (link)
User intent can be
(a) to view videos
from
facebook.com that
are useful/popular
Video url (link)
among GB users.
Or (b) view news &
info/how to videos
about Facebook
the company or its
products.
Relevance rating: Highly relevant. These results meet the requirement of
“news/information or how to videos” about Facebook the company or
product/Service. Hence, these are Highly Relevant.
Facebook (de-DE) Video Url (link)
User intent can be
(a) to view videos
from
facebook.com that
are useful/popular
among German
users.
Language rating: Severely degrades, as the video in Spanish are not useful in
Or (b) view news &
Germany.
info/how to videos
about Facebook
the company or its
products.
El Pais (es-ES) Video url (link)

User intent is to
view videos from El
Pais newspaper
(website or
youtube/facebook
channel). Video url (link)
Relevance rating: highly relevant (as these videos meet the requirement of being
from El Pais.com or the El Pais channel on youtube/facebook).

You might also like