Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Classify Intent of the Query – Guidelines

How to perform the task


In this task you will classify queries as Adult, Junk, or Foreign based on the text and intention of the query. Although this
may seem like a trivial task, please note that there are many borderline cases for which we have listed more exact guidelines
below. Please note that you can and in fact SHOULD mark some queries as a combination of these three categories. For
example “tetek lucah 3451gsd545” is adult, junk, and foreign in non-malay markets. This document is written from the
perspective of someone in an English speaking region, but you should interpret the meaning of this document by applying its
intent to your own locale.

Adult Classification
Queries that have adult intent and which yield sexual or violent content should be marked as ADULT. Both of these criteria
must be met for a query to be considered ADULT. Please note that in some rare cases you may see queries that are searching
for child pornography. If you encounter one of these, please mark it as ADULT and report a technical issue as well. Queries
seeking strictly informational results on sexual or violent content, or for which search engine results are strictly or primarily
informational do not fall into this category.

These are ADULT:


adultfriendfinder
- The user is searching for adult content, so it should be marked as ADULT
nsfl fatal accident pictures
- The user is searching for content that has gore
cream pie порно
- The user is searching for sexual content. This query is also FOREIGN if the expected language is not Russian, however,
if you are working in Russia, you should NOT mark this as FOREIGN because it is of the expected language for you.
These are NOT ADULT:
korean fried chicken restaurants near me
- This query has nothing to do with adult content at all
g-spot
- Even though the user intent may or may not have been to search for sexual content, the results of this query
are informational only, and as such this query should not be marked ADULT
documentaries on porn industry
- Both the user intent and search results are informational

Junk Classification
Queries that contain gibberish should be marked as junk, even if only a portion of the query is gibberish. Even if a query
consists entirely of legible words, it should still be marked as JUNK if there is no clear intent. However, minor misspellings
should not be marked as junk. Simple removal of diacritics (e.g. é,à -> e,a) should NOT be marked as JUNK. URL fragments and
email addresses should NOT be marked as JUNK as long as it is clear that they are referring to actual websites or email
addresses. Keyboard errors, such as forgetting to turn on foreign typing settings, should NOT be marked as JUNK. In general,
you should defer to the intention of the query rather than the query text if there is ambiguity. If the query has a
legitimate intention, it should not be marked JUNK.
These are JUNK:
houw do I logg in343zfto face bokasf333324g4gggggggggggggggg
- This query contains a legitimate intent, but also has many spam characters interspersed
hwawl4jhkl44jk;lhl
- This query consists only of meaningless spam characters
this book book book can be book book book
- This query consists of actual English words, but has no meaning
axabraxasmaxas123.com login
- Query contains and is searching for a url for which there is no reasonably interpretable content for its intent under
the assumption that it is misspelled, and which does not exist under the assumption that it is not
misspelled. ?? Not sure abt this one C:\user\documents\programfiles txt
- This is not a query a user would type into a search engine and as such should be treated as junk

These are NOT JUNK:


houw do I logg into face bok
- This query contains a misspelling or two, but the intent is clear and does not contain spam characters aside from
minor misspellings. You may use your judgment on how many misspellings is too many.
3betting with ace 4 or not
- This query refers to betting in poker, even though it may appear as nonsense to non-players.
rich guy site:onion.com
- This query is referring to a website, but has a clear intent.
Qhanftjaxndj
- This is a search query for a Korean website; however the user forgot to turn on the Korean keyboard setting and so it
appears as gibberish. If working on a task for which the expected language is not Korean, this should also be marked
as FOREIGN.
X Æ A-12
- This was the original name that Elon Musk was about to name his son, and so it should not be marked as
JUNK dbjt25-130-2012
- This query is searching for an indentification code for the author of a Chinese construction engineering design standard

Foreign Query Classification


Whether or not a query is FOREIGN depends on the expected language of the task. PLEASE READ THE EXPECTED LANGUAGE
CAREFULLY as it will tell you the language that most of the queries should be in. We have changed the definition of foreign
recently and now expect you to use location and cultural context information to determine if a query is foreign. Our goal is to
separate the queries so that their intentions are judged by the people who are best able to understand the intent of the query.
Therefore, even if a query appears in a language other than the expected language, check the location it was sent from as well as
the user intention to determine whether it would be beneficial to send it to people who live in a country other than the ones that
predominantly use the expected language of this task. In general, you should consider 3 things:
1. Whether the language the query is in matches the expected language completely, partially, or not at all.
2. Whether the query contains culturally specific search terms
3. The location from which the query was sent
4. Whether the query expects that the search results will be in a language that matches the expected language completely,
partially, or not at all.
Based on these criteria, use your own judgment to decide whether the query would be more easily understood by people in a
foreign market.
These are FOREIGN (assuming expected language of ENGLISH, but your market
might be different):
Qhanftjaxndj From: Seoul, Korea
- User is searching for a Korean website but forgot to turn on the Korean keyboard setting. Text is in English letters but
the user expects Korean results.
dbjt25-130-2012 From: Tokyo, Japan
- This query is searching for an identification code for the author of a Chinese construction engineering design standard
which is best understood by those in the Chinese language market. Since Chinese is not the expected language in this
example, the query is foreign.
buy geely car From: Beijing, China
- This query is entirely in English, but it was searched from Beijing and is talking about a Chinese car brand, so it is
foreign since those in the Chinese speaking market would be most able to understand the intent of this query.

These are NOT FOREIGN (assuming expected language of ENGLISH, but your market
might be different):
how do I write “hi” in Chinese From: Washington, United States
- Although the user expects to see the word “hi” in Chinese, he/she is almost certainly looking for English guides on
the topic.
comprar una caña de pescar en Honolulu hawái From: Hawaii, United States
- Although this is entirely in Spanish, it is from the United States, and is requesting instructions on how to purchase
a fishing pole in the state of Hawaii, which is something that belongs in the en-us market.
breaking bad season 1 From: Beijing, China
- The user is searching for an American TV show in English, so it would be best to send this to an English market
even though it was searched from China.

You might also like