Professional Documents
Culture Documents
Popple Rater - Query Categorization (7.22.2021)
Popple Rater - Query Categorization (7.22.2021)
CONTENTS
1 – Introduction ...................................................................................................................................................... 1
l
1.1 Before you Begin....................................................................................................................................... 2
1.2 Research ................................................................................................................................................... 2
82 ia
2 – Valid Queries ..................................................................................................................................................... 2
2.1 Valid – Location Relevant: ........................................................................................................................ 4
2.2 Valid – Location Not Relevant:.................................................................................................................. 5
3 – Vague ................................................................................................................................................................ 7
65 nt
4 – Foreign .............................................................................................................................................................. 9
4.1 Brands and Names: ................................................................................................................................... 9
4.2 Websites/Partial Website: ........................................................................................................................ 9
4.3 Borrowed Words: ..................................................................................................................................... 9
3
29 ide
4.4 Translation Seeking Intent: ..................................................................................................................... 10
4.5 Partially Foreign Query: .......................................................................................................................... 10
4.6 Foreign Script: ......................................................................................................................................... 11
4.7 Spanish (United States):.......................................................................................................................... 11
5 – Inappropriate .................................................................................................................................................. 11
5.1 Ambiguous Inappropriate ....................................................................................................................... 12
5.2 Clearly Inappropriate .............................................................................................................................. 12
nf
1 – INTRODUCTION
Co
Query Categorization consists of 1 search query, 1 user location and 5 category options. The
goal of this task is to understand the query’s intent, think about whether the user’s location
impacts the query’s intent, and then decide which of the 5 categories the query belongs in.
1 of 13
1.1 BEFORE YOU BEGIN
You must complete this task on a desktop or laptop. Never use a mobile phone or a tablet.
Please ensure your computer is configured to be in the language of the project you are
working on. You are expected to always work using an Incognito Window in Chrome. You
should be logged out of all personal accounts.
l
In preparation for the task, please download the Popple Query Categorization
82 ia
Guidelines from the Popple Project page in Appen Connect before you access the job.
1.2 RESEARCH
The key to rating correctly is making sure that you understand the query and what kind of
results the user intended to find. Ask yourself what kind of information did the user who
65 nt
searched for this query intend to see? Will the user’s location impact the results?
Researching the query might be necessary for queries that you are not familiar with at first
glance.
Because every query has a unique location associated to it, you should use the search
engine buttons on the tool to conduct your research. The buttons will enable you to see
3
29 ide
results that catered to the “User’s Location”. Buttons are shown below:
nf
2 – VALID QUERIES
Valid Query Definition: A query is a search a user conducted, and the user intent is the
Co
purpose of the search. Valid queries are searches where the intention of the search is
understood. For example, if the query is pizza the user’s intent was likely to find results
related to pizza (restaurants, recipes, general information about pizza etc.).
A query’s intent can sometimes be difficult to understand, but after enough research you
will find that majority of queries have a clear intent. Most of the queries that you see will be
valid. Examples of Valid queries:
• Gibberish: Sometimes a query that looks like gibberish can be a meaningful query
once research is conducted.
o Example: bhldn
▪ The query looks like a gibberish, but it is the name of Bridal shop.
o Example: BCl3
▪ Research Findings: It is the formula for a chemical compound.
2 of 13
• Numbers: Queries that are numbers such as phone numbers, IP addresses or math
equations are valid queries.
o Example Query: 127.0.0.0
▪ Research Findings: It is an IP address, and someone may want to learn
about a loopback IP address.
l
o Example Query: +1 (757) 933-4396
82 ia
▪ Research Findings: It is a phone number, and someone wants to
lookup a business or person that called them recently.
o Example Query: 18/100
▪ Research Findings: Someone wants to know more about fractions and
percentages.
•
65 nt
Possible Spelling Mistakes: Partial queries or queries where a user could have
mistyped a query, should not be marked as vague. If the search engine auto-corrects
the query and offers a suggestion that is reasonable then it is valid query.
o Example Query: fcebook OR faceb
▪ Research Findings: Search engine produces results “Facebook” and it’s
3
29 ide
likely the user misspelled the query.
nf
▪
o Example Query: wmut
Co
3 of 13
• Phrases: Sometimes what looks like a meaningless phase can be song lyrics, poems
or quotes.
o Example Query: i have eaten the plums
▪ Research Findings: A quote from a poem published in 1934 that
recently resurfaced and became a meme. It’s a valid query.
l
• Names: You will encounter queries that are searching for a person’s name. While
82 ia
there are many people with the same name, these queries are not vague. The intent
is clear, the user is trying to find a person with that name.
65 nt
Location Relevant Definition: If search results for valid queries are specific to the user’s
location, then the query is Location Relevant. For example, if two users in different cities
search for the query “dentists” they will see specific results that are catered to their unique
location. This makes the user’s location relevant, and this project aims to capture whether
the user’s location is relevant or not. Examples below are Location Relevant:
3
29 ide
nf
Co
Please remember that Location Relevant does not mean that the query should be a
destination like Yosemite or Hollywood.
Every query will have a unique User Location that must be taken into consideration when
the query is Valid. Because every query has a unique location assigned to it you must use
the search engine buttons on the tool to conduct all research. Clicking the buttons allows
you to see results that are tailored to the user’s location.
4 of 13
l
82 ia
65 nt
Here are some more examples of queries that are Location Relevant:
• Queries with words such as “near me” or “nearby”: pizza near me, dentists near me
3
•
29 ide
General services for which one might expect local results: cabs, restaurants, gas
stations.
• Information seeking queries for which the results might be different based on
location within a locale: state lotto results, driver’s license requirements, dmv hours
median, house prices,
• Queries with vague location: “dentists Springfield” is location relevant because there
are multiple Springfields in the US. However, “dentists Springfield, IL” is not location
relevant because it’s too specific and becomes a unique location.
nf
• Queries that are offering coupons or deals can sometimes be location relevant but
not always. If the sale/coupon is region based, meaning that it’s only for people in
that area, then it is considered location relevant.
Location Not Relevant Definition: If search results for Valid queries return the same results
for all users regardless of their location, then the query is Location Not Relevant. For
example, if two users in different cities search for the query “cookie recipe” they will see the
same results despite differences in their geographic locations. This makes the user’s location
not relevant, and this project aims to capture whether the user’s location is relevant or not.
Examples below are Location Not Relevant:
5 of 13
l
82 ia
65 nt
It is important to note that queries that specify a location within the query are NOT Location
Relevant. See examples below:
3
29 ide
o User Location: San Jose, California
▪ Research Findings: This is not location relevant. The user location
is San Jose but is looking for cab information that is specific to Los
Angeles. The results for the query will always show cab
information for Los Angeles no matter where the user is located.
nf
Co
6 of 13
▪ Research Findings: Some queries might seem like they are
Location Relevant, but they are not. The query “airline tickets” for
example will give you the same result no matter where a user is
located.
l
82 ia
65 nt
3
29 ide
Here are some more examples of queries that are NOT Location Relevant:
•
nf
Web based companies like yahoo or google are not location relevant.
• Queries with locations/addresses: “106 foxwood dr Jericho”, “Las Vegas, NV”,
“Walmart in San Jose”, “where is Laredo Texas”.
• Unique landmarks/places of business: Alcatraz Island, The French Laundry
• Queries seeking for the websites of businesses with physical locations will never be
Co
3 – VAGUE
7 of 13
Vague Definition: Queries where the intent of the user is not strong. Queries can have
multiple strong intents; however, Vague queries are only those where there is no clear
intent.
Only a small fraction of the queries you encounter will be Vague. If you find yourself rating a
lot of queries vague then that is a good indication that you need to spend more time on
l
research.
82 ia
• Gibberish: Queries that do not have a clear and reasonable meaning are considered
gibberish. Oftentimes they are just random text with no significance.
• Lacks Intent: If you cannot find a strong possible intent for a query, then the query is
likely vague.
65 nt
3
29 ide
nf
• No Results: Very specific URLs that return only one or two results or no results at all
can be marked vague.
• Incomplete Websites: When a query is seeking a site but is incomplete like www.r
you should mark this vague unless you can identify the user’s intent like www.amaz
Co
8 of 13
Examples of Vague queries:
• sgf k sg
• lmkmmmmmkmmmmmmm i'm mkmmmm
• www.r
l
4 – FOREIGN
82 ia
Foreign Definition: Queries that are not in your locale’s language are considered foreign
queries, and such queries should be marked Foreign. The exceptions to this rule are outlined
below:
Sometimes you may see a query from a foreign language, but it may be a valid query. For
65 nt
example [arc de triomphe] is a query in French, however a quick search shows the user is
looking for information on the famous landmark in France. This type of query should be
rated for relevance. We do not consider these foreign queries.
3
29 ide
Queries that contain only names of brands (Pepsi), places or landmarks (Arc de Triomphe,
Teotihuacan, or Tower of London), application names (WeChat), food items (pizza or sushi),
people (Anne Hathaway or Haruma Miura), bands (Rammstein or Maroon 5), products
(iPhone), are not considered foreign.
However, when you add additional terms such as “ciudad de Paris” or “entregas de pizza” or
“Maroon 5 musica” then it is important to consider what language of those terms are in
when evaluating whether or not the query is foreign.
Websites should not be rated foreign, even if the words that form the website name are not
in your language.
Common foreign words such as “sushi”, “karma”, “anago”, “karaoke”, “taco” are not foreign
as they are popular enough that they have been adopted by most countries.
9 of 13
l
82 ia
4.4 TRANSLATION SEEKING INTENT:
65 nt
When the user intent is to find a translation or definition of a foreign word, for example,
“meaning of 今年冬至” that query is NOT foreign. Such queries are not foreign because
they are seeking to translate a foreign word into the language of the user.
3
29 ide
nf
Co
Queries that have one or more words that are foreign and do NOT have translation seeking
intent ARE foreign.
The queries below are considered foreign because they have at least one word that is
foreign and it can change the intent of the query. We do not expect anyone to understand
any other language other than their locale language and English, thus such queries should
be rated foreign.
(All examples in guidelines are from the perspective on English, United States).
10 of 13
4.6 FOREIGN SCRIPT:
A query that is fully in foreign script is generally a tell-tale sign of a foreign query. Example,
今年冬至 in the English (United States) locale and all other English-speaking locales can be
marked as a foreign query.
l
82 ia
65 nt
3
29 ide
nf
This rule ONLY applies to Spanish, United States locale. Queries that are in English or
Spanish are not considered foreign. If you are not in the Spanish (United States) Popple
roster, then please disregard this rule.
Co
5 – INAPPROPRIATE
Inappropriate or profane/offensive content seeking queries should be marked as
inappropriate. No results are expected for these because our search does not support this
query class.
11 of 13
• It describes graphic violence and/or would likely retrieve grisly content.
l
inappropriate if you research the query’s meaning/intent. You must research queries whose
82 ia
intent is not clear or known to you.
• Beeg
•
65 nt
Transxlist
• Desire5k
• Cfake.com
If most of the results are inappropriate, then the query should be rated inappropriate.
3
29 ide
5.2 CLEARLY INAPPROPRIATE
Other queries will be clearly inappropriate and little to no research will be required. Such
queries will have words like “porn”, derogatory terms or slurs in the query and the intent to
search for inappropriate content will be clear.
If a query has results that are mostly inappropriate, then that is a clear indication that the
query itself is inappropriate.
nf
It is important that only queries that fall into the inappropriate category are marked as
inappropriate. Some queries can seem inappropriate at a glance but after research you may
discover that they are not inappropriate at all. For example, queries that seek information
Co
12 of 13
6 – QUERY CATEGORIZATION DECISION TREE
l
82 ia
65 nt
3
29 ide
nf
Co
13 of 13