Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

POPPLE PRODUCTION - QUERY CATEGORIZATION

CONTENTS

1 – Introduction ...................................................................................................................................................... 1

l
1.1 Before you Begin....................................................................................................................................... 2
1.2 Research ................................................................................................................................................... 2

82 ia
2 – Valid Queries ..................................................................................................................................................... 2
2.1 Valid – Location Relevant: ........................................................................................................................ 4
2.2 Valid – Location Not Relevant:.................................................................................................................. 5
3 – Vague ................................................................................................................................................................ 7

65 nt
4 – Foreign .............................................................................................................................................................. 9
4.1 Brands and Names: ................................................................................................................................... 9
4.2 Websites/Partial Website: ........................................................................................................................ 9
4.3 Borrowed Words: ..................................................................................................................................... 9

3
29 ide
4.4 Translation Seeking Intent: ..................................................................................................................... 10
4.5 Partially Foreign Query: .......................................................................................................................... 10
4.6 Foreign Script: ......................................................................................................................................... 11
4.7 Spanish (United States):.......................................................................................................................... 11
5 – Inappropriate .................................................................................................................................................. 11
5.1 Ambiguous Inappropriate ....................................................................................................................... 12
5.2 Clearly Inappropriate .............................................................................................................................. 12
nf

5.3 Not Inappropriate ................................................................................................................................... 12


6 – Query Categorization Decision Tree ............................................................................................................... 13

1 – INTRODUCTION
Co

Query Categorization consists of 1 search query, 1 user location and 5 category options. The
goal of this task is to understand the query’s intent, think about whether the user’s location
impacts the query’s intent, and then decide which of the 5 categories the query belongs in.

1 of 13
1.1 BEFORE YOU BEGIN
You must complete this task on a desktop or laptop. Never use a mobile phone or a tablet.
Please ensure your computer is configured to be in the language of the project you are
working on. You are expected to always work using an Incognito Window in Chrome. You
should be logged out of all personal accounts.

l
In preparation for the task, please download the Popple Query Categorization

82 ia
Guidelines from the Popple Project page in Appen Connect before you access the job.

1.2 RESEARCH
The key to rating correctly is making sure that you understand the query and what kind of
results the user intended to find. Ask yourself what kind of information did the user who

65 nt
searched for this query intend to see? Will the user’s location impact the results?
Researching the query might be necessary for queries that you are not familiar with at first
glance.

Because every query has a unique location associated to it, you should use the search
engine buttons on the tool to conduct your research. The buttons will enable you to see

3
29 ide
results that catered to the “User’s Location”. Buttons are shown below:
nf

2 – VALID QUERIES
Valid Query Definition: A query is a search a user conducted, and the user intent is the
Co

purpose of the search. Valid queries are searches where the intention of the search is
understood. For example, if the query is pizza the user’s intent was likely to find results
related to pizza (restaurants, recipes, general information about pizza etc.).

A query’s intent can sometimes be difficult to understand, but after enough research you
will find that majority of queries have a clear intent. Most of the queries that you see will be
valid. Examples of Valid queries:

• Gibberish: Sometimes a query that looks like gibberish can be a meaningful query
once research is conducted.
o Example: bhldn
▪ The query looks like a gibberish, but it is the name of Bridal shop.
o Example: BCl3
▪ Research Findings: It is the formula for a chemical compound.

2 of 13
• Numbers: Queries that are numbers such as phone numbers, IP addresses or math
equations are valid queries.
o Example Query: 127.0.0.0
▪ Research Findings: It is an IP address, and someone may want to learn
about a loopback IP address.

l
o Example Query: +1 (757) 933-4396

82 ia
▪ Research Findings: It is a phone number, and someone wants to
lookup a business or person that called them recently.
o Example Query: 18/100
▪ Research Findings: Someone wants to know more about fractions and
percentages.

65 nt
Possible Spelling Mistakes: Partial queries or queries where a user could have
mistyped a query, should not be marked as vague. If the search engine auto-corrects
the query and offers a suggestion that is reasonable then it is valid query.
o Example Query: fcebook OR faceb
▪ Research Findings: Search engine produces results “Facebook” and it’s

3
29 ide
likely the user misspelled the query.
nf


o Example Query: wmut
Co

▪ Research Findings: Query wmut is corrected to wmur and the results


are for a TV station, which has reasonable intent.

3 of 13
• Phrases: Sometimes what looks like a meaningless phase can be song lyrics, poems
or quotes.
o Example Query: i have eaten the plums
▪ Research Findings: A quote from a poem published in 1934 that
recently resurfaced and became a meme. It’s a valid query.

l
• Names: You will encounter queries that are searching for a person’s name. While

82 ia
there are many people with the same name, these queries are not vague. The intent
is clear, the user is trying to find a person with that name.

2.1 VALID – LOCATION RELEVANT:

65 nt
Location Relevant Definition: If search results for valid queries are specific to the user’s
location, then the query is Location Relevant. For example, if two users in different cities
search for the query “dentists” they will see specific results that are catered to their unique
location. This makes the user’s location relevant, and this project aims to capture whether
the user’s location is relevant or not. Examples below are Location Relevant:

3
29 ide
nf
Co

Please remember that Location Relevant does not mean that the query should be a
destination like Yosemite or Hollywood.

Every query will have a unique User Location that must be taken into consideration when
the query is Valid. Because every query has a unique location assigned to it you must use
the search engine buttons on the tool to conduct all research. Clicking the buttons allows
you to see results that are tailored to the user’s location.

o Example Query: cabs


o User Location: San Jose, California
▪ Research Findings: This query is Location Relevant because results
show cab services that are specific the San Jose, CA. area. Notice how
search results are different for users in San Jose vs New York.

4 of 13
l
82 ia
65 nt
Here are some more examples of queries that are Location Relevant:

• Queries with words such as “near me” or “nearby”: pizza near me, dentists near me

3

29 ide
General services for which one might expect local results: cabs, restaurants, gas
stations.
• Information seeking queries for which the results might be different based on
location within a locale: state lotto results, driver’s license requirements, dmv hours
median, house prices,
• Queries with vague location: “dentists Springfield” is location relevant because there
are multiple Springfields in the US. However, “dentists Springfield, IL” is not location
relevant because it’s too specific and becomes a unique location.
nf

• Queries that are offering coupons or deals can sometimes be location relevant but
not always. If the sale/coupon is region based, meaning that it’s only for people in
that area, then it is considered location relevant.

2.2 VALID – LOCATION NOT RELEVANT:


Co

Location Not Relevant Definition: If search results for Valid queries return the same results
for all users regardless of their location, then the query is Location Not Relevant. For
example, if two users in different cities search for the query “cookie recipe” they will see the
same results despite differences in their geographic locations. This makes the user’s location
not relevant, and this project aims to capture whether the user’s location is relevant or not.
Examples below are Location Not Relevant:

5 of 13
l
82 ia
65 nt
It is important to note that queries that specify a location within the query are NOT Location
Relevant. See examples below:

o Example Query: cabs Los Angeles

3
29 ide
o User Location: San Jose, California
▪ Research Findings: This is not location relevant. The user location
is San Jose but is looking for cab information that is specific to Los
Angeles. The results for the query will always show cab
information for Los Angeles no matter where the user is located.
nf
Co

o Example Query: airline ticket


o User Location: San Jose, California

6 of 13
▪ Research Findings: Some queries might seem like they are
Location Relevant, but they are not. The query “airline tickets” for
example will give you the same result no matter where a user is
located.

l
82 ia
65 nt
3
29 ide
Here are some more examples of queries that are NOT Location Relevant:


nf

Web based companies like yahoo or google are not location relevant.
• Queries with locations/addresses: “106 foxwood dr Jericho”, “Las Vegas, NV”,
“Walmart in San Jose”, “where is Laredo Texas”.
• Unique landmarks/places of business: Alcatraz Island, The French Laundry
• Queries seeking for the websites of businesses with physical locations will never be
Co

Location Relevant: “Walmart.com”, “Macys online”, “Chipotle online ordering”,


“library of congress website”.
• Queries searching for online companies are never location relevant as results are the
same for all users regardless of their physical location.
• Information that does not change based on location: “Taco Bell menu”, “federal tax
rate”.
• Companies or chain companies that are no longer in business should not be location
relevant as physical locations no longer exist.
• Queries that are searching for people are no location relevant.
• Queries that are searching for coupons, deals or sales are not location relevant if
they are nationally available, meaning it is available to everyone across your country.

3 – VAGUE

7 of 13
Vague Definition: Queries where the intent of the user is not strong. Queries can have
multiple strong intents; however, Vague queries are only those where there is no clear
intent.

Only a small fraction of the queries you encounter will be Vague. If you find yourself rating a
lot of queries vague then that is a good indication that you need to spend more time on

l
research.

82 ia
• Gibberish: Queries that do not have a clear and reasonable meaning are considered
gibberish. Oftentimes they are just random text with no significance.
• Lacks Intent: If you cannot find a strong possible intent for a query, then the query is
likely vague.

65 nt
3
29 ide
nf

• No Results: Very specific URLs that return only one or two results or no results at all
can be marked vague.
• Incomplete Websites: When a query is seeking a site but is incomplete like www.r
you should mark this vague unless you can identify the user’s intent like www.amaz
Co

(www.amazon.com) you would mark this as a valid query.

8 of 13
Examples of Vague queries:

• sgf k sg
• lmkmmmmmkmmmmmmm i'm mkmmmm
• www.r

l
4 – FOREIGN

82 ia
Foreign Definition: Queries that are not in your locale’s language are considered foreign
queries, and such queries should be marked Foreign. The exceptions to this rule are outlined
below:

Sometimes you may see a query from a foreign language, but it may be a valid query. For

65 nt
example [arc de triomphe] is a query in French, however a quick search shows the user is
looking for information on the famous landmark in France. This type of query should be
rated for relevance. We do not consider these foreign queries.

4.1 BRANDS AND NAMES:

3
29 ide
Queries that contain only names of brands (Pepsi), places or landmarks (Arc de Triomphe,
Teotihuacan, or Tower of London), application names (WeChat), food items (pizza or sushi),
people (Anne Hathaway or Haruma Miura), bands (Rammstein or Maroon 5), products
(iPhone), are not considered foreign.

However, when you add additional terms such as “ciudad de Paris” or “entregas de pizza” or
“Maroon 5 musica” then it is important to consider what language of those terms are in
when evaluating whether or not the query is foreign.

• Foreign Example: Canciones de Maroon 5


nf

• Foreign Example: Onde fica Paris


• Foreign Example: atriz Anne Hathaway

4.2 WEBSITES/PARTIAL WEBSITE:


Co

Websites should not be rated foreign, even if the words that form the website name are not
in your language.

• Not Foreign: www.faceb


• Not Foreign: booking.com
• Not Foreign: https://www.louvre.fr/en
• Not Foreign: https://chapultepec.org.mx/

4.3 BORROWED WORDS:

Common foreign words such as “sushi”, “karma”, “anago”, “karaoke”, “taco” are not foreign
as they are popular enough that they have been adopted by most countries.

9 of 13
l
82 ia
4.4 TRANSLATION SEEKING INTENT:

65 nt
When the user intent is to find a translation or definition of a foreign word, for example,
“meaning of 今年冬至” that query is NOT foreign. Such queries are not foreign because
they are seeking to translate a foreign word into the language of the user.

3
29 ide
nf
Co

4.5 PARTIALLY FOREIGN QUERY:

Queries that have one or more words that are foreign and do NOT have translation seeking
intent ARE foreign.

The queries below are considered foreign because they have at least one word that is
foreign and it can change the intent of the query. We do not expect anyone to understand
any other language other than their locale language and English, thus such queries should
be rated foreign.

(All examples in guidelines are from the perspective on English, United States).

o Example Query: America en Concacaf


o Example Query: Chivas hoy

10 of 13
4.6 FOREIGN SCRIPT:

A query that is fully in foreign script is generally a tell-tale sign of a foreign query. Example,
今年冬至 in the English (United States) locale and all other English-speaking locales can be
marked as a foreign query.

l
82 ia
65 nt
3
29 ide
nf

4.7 SPANISH (UNITED STATES):

This rule ONLY applies to Spanish, United States locale. Queries that are in English or
Spanish are not considered foreign. If you are not in the Spanish (United States) Popple
roster, then please disregard this rule.
Co

5 – INAPPROPRIATE
Inappropriate or profane/offensive content seeking queries should be marked as
inappropriate. No results are expected for these because our search does not support this
query class.

A query is Inappropriate if it falls into any of the following categories:

• It encourages illegal behaviour (e.g., piracy, child pornography).


• It uses derogatory language to refer to certain groups (racial slurs), or otherwise
constitutes hate speech.
• It includes words generally considered profane.
• It is concerned with non-medical sexual practices or situations. This includes queries
that are clearly related to pornography.

11 of 13
• It describes graphic violence and/or would likely retrieve grisly content.

5.1 AMBIGUOUS INAPPROPRIATE


Most queries will require you to do research as their meaning and intent will not always be
straightforward. You will encounter queries that seem ambiguous but are clearly

l
inappropriate if you research the query’s meaning/intent. You must research queries whose

82 ia
intent is not clear or known to you.

Examples of ambiguous inappropriate queries:

• Beeg

65 nt
Transxlist
• Desire5k
• Cfake.com

If most of the results are inappropriate, then the query should be rated inappropriate.

3
29 ide
5.2 CLEARLY INAPPROPRIATE
Other queries will be clearly inappropriate and little to no research will be required. Such
queries will have words like “porn”, derogatory terms or slurs in the query and the intent to
search for inappropriate content will be clear.

If a query has results that are mostly inappropriate, then that is a clear indication that the
query itself is inappropriate.
nf

5.3 NOT INAPPROPRIATE

It is important that only queries that fall into the inappropriate category are marked as
inappropriate. Some queries can seem inappropriate at a glance but after research you may
discover that they are not inappropriate at all. For example, queries that seek information
Co

about human anatomy or health are NOT considered inappropriate.

Here are examples queries that are not inappropriate:


• was ciely gay on color purple.
• Human anatomy
• Loverbaby

12 of 13
6 – QUERY CATEGORIZATION DECISION TREE

l
82 ia
65 nt
3
29 ide
nf
Co

13 of 13

You might also like