Penguin - External Updated Guidelines For Query To Interest Labeling (4 2023)

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Guidelines for Query Interest Tagging

[CONFIDENTIAL - DO NOT DISTRIBUTE]


Table of Contents
Last Updated: 01/18/2023

Intro to Taxonomy Node Tagging 2


0. Welcome 2
0.1 Glossary 2
0.2 Steps 3
1. Review the Entities 3
2. [OPTIONAL] Select a Checkbox and leave comments 3
3. Taxonomy Node Association 4
1.0 Query Entities 6
1.1 Vertical Examples 6
1.2 Ambiguous examples 8
2.0 L1 Verticals 15

Intro to Taxonomy Node Tagging


0. Welcome
Our goal is to find the best entities to represent the meaning of nodes in our Taxonomy

We will do this by reviewing Entities (queries) and selecting the most appropriate node(s) (or
category(-ies)).

0.1 Glossary
● Entity: In this workflow an entity is a search query.

● Vertical: a vertical is a top-level category in the taxonomy.


○ For example, Food & Drink is a vertical.

● Node: a node is a category in the taxonomy.


○ For example, Alcoholic Drinks is a node.

● Intent: Intent is what the average user is likely looking for, thinking about, researching,
or wanting to know more about when they searched for a particular query.
● Parent/Children/Grandchildren: Used to identify relations in the taxonomy. A node’s
children are the different node(s) that exist under it. E.g. Tequila is the child of liquor and
Liquor is the parent of Tequila. Similarly Alcoholic Drinks is Tequila’s grandparent.
○ Food and Drinks > Drinks > Alcoholic Drinks > Liquor > Tequila

0.2 Steps

1. Review the Entities


○ Click on the linked entity and review the Google search results.
○ Identify the entity's main L1 verticals or top level categories based on its overall
content and intent (i.e. the purpose/reason of the entity).
i. NOTE: If an entity is not known to you, research it.
○ Think about the overall intention of the vertical you’re considering and ask
yourself:
i. What types of queries are intended for the vertical?
ii. Which pinners in which state of mind are likely to engage with these
queries?
iii. What are the expected results upon searching for a specific query?

2. [AS NEEDED] Select a Checkbox and leave comments:

○ The link for the entity is broken or there is nothing to label.


■ If the entity is not available to judge (dead page, blank page, etc), check
this box and move on.

○ The entity is too ambiguous.


■ If the query results are a mix of intents and you’re unable to mentally
summarize potential intents.

○ This entity does not fit with any node at all.


■ If the entity is not ambiguous, but there is no taxonomy node available
that you feel represents this entity, check the box, leave a comment
noting what node is needed, and skip labeling.

○ I do not understand the language of this entity.


■ If the topic of the entity is in a non-local language different from English or
a language you do not understand, check the box and skip labeling.

■ NOTE: Without spending too much time, you should attempt to do your
best to understand the entity (via Google translate) before selecting this
check box.

○ This entity is unsafe.


■ Review search results. If the content of the entity is unsafe, check the
box and skip labeling.

■ The query has antagonistic, explicit, false or misleading, harmful, hateful,


or violent content or behavior.
■ Examples of unsafe queries: aex party, naked atheletes, underslung
shotgun, crotch shots, make a rifle stock, lolitas

3. Taxonomy Node Association


○ Select ALL of the most granular and appropriate node(s) under the identified L1
categories. Remember, think about the overall intention of the vertical.

i. A node with name “X” under vertical “Y” should be selected when the
entity is related to pinner’s need to solve the following: “I want to do
something about X (the node) under the intent of Y (the L1 vertical)”
ii. Keep the intent of the L1 Vertical in mind when selecting a node. Both the
L1 vertical and the selected node need to make sense for the entity in
order to be selected.
1. Example: A query about “how to grow Cilantro” makes sense in
the Gardening vertical, but not in the Food and Drinks vertical
because it’s not about how to use Cilantro in recipes.

We would not select the node with the name “Cilantro” under
vertical “Food and Drinks” here; that is not helping them solve the
following “I want to do something about growing cilantro in
Gardening.”

The two appropriate nodes to select would be the granular nodes


“Planting Ideas” and/or “Herb Garden” in the vertical Gardening as
by selecting these, the pinner will solve “I want to do something
about growing cilantro in Gardening.”

iii. For queries, it means the queries are what pinners might search when
they face the problem.

○ Children of each parent node are displayed directly below the node name (in the
section “Additional related interests under it”) to help you feel confident you are
selecting the most appropriate node for the query.

○ When you find a node(s) that represents the intent of the query, click on the
“Select this category” link. You should see your node selection appear below the
selection tree box. If needed, you can click on the “x” or click on “Deselect this
category” to remove the associated node.
i. When you select a node, its parent, grandparent, etc., are automatically
included, as you’ll see below the selection tree box. No need to select the
parent, grandparent, etc., individually.
1.0 Query Entities
● Queries will point to Google search results. Google provides a good, relatively unbiased
and comprehensive view of what a query is about and what people most likely want to
see given the query.

● One to two word queries are often very difficult to categorize (“distraction,” “nice boy”)
○ If you can not categorize the query, iit may be most appropriate to select the
“Ambiguous” checkbox for these and not label.

● If a query could have multiple meanings, but you still have an idea what the intent is,
please follow the guidelines and “Select nodes within as many verticals as the
intents of the entity.” Some examples:
○ “therapeutic activities” - labelers can select Physical Health AND Mental Health.
○ “cat and man” - labelers can select Entertainment AND Photography (as google
search shows those to be the likely reason to query that)
○ “yellow ico” could be a misspelling of ‘yellow icon’ or could be the Final Fantasy
character ‘ico yellow.’ Labelers can select Digital Art AND Entertainment >
Fantasy Games

● Use your best judgment, but as a general guideline only focus on:
○ The FIRST PAGE of Google search results; and
○ How many times something appears (e.g. ‘ico yellow’ only appears once)

● Not every query with a keyword belongs in that node.


○ For example not all queries with “x images” belong in Photography and not all “y
video” belong in Video.
○ Also, even if the keyword “images” is not used, you should still be thoughtful
about when you select Art (Photography, Illustration, etc). See the examples
“alone in street” vs “jawa poster” below.

● While we do want all relevant nodes selected, it is OK to label on the stricter side.
○ For “dog drawing,” they’re interested in dogs, but they’re looking for a drawing, or
wanting to learn how to draw one. In this example, only a node for ‘drawing’ is
needed.
○ For “womens skirt pattern,” they’re interested in getting a pattern to make a skirt.
They want a pattern, and in the end, they want a skirt. A node(s) for the pattern
and a node for the skirt are needed.

● If there is a crafting element to a query’s results, such as a handmade tablecloth, a


parent-made game, or handcrafted jewelry, the most granular node within DIY and Crafts
should also be selected due to the crafting element even if the intent isn’t DIY. For
example, jewelry from etsy isn’t DIY, but it is a craft as it’s handcrafted.
1.1 Vertical Examples

Note: The nodes in red in the examples below are the most granular selectable nodes. The
grayed out nodes can be found under “additional related interests under it” in Appen (ADAP)
and are provided for context

Query jawa poster Art > Poster Design Search tells us jawa is a
motorcycle brand or a
Home Decor > Wall > Wall character in Star Wars.
Art > Posters, Prints, & This query is about
Visual Artwork someone looking for or
wanting to create a poster
of a motorcycle, not a
motorcycle. Select labels
for the poster only.

Query crocheted afghan DIY and Crafts > Fabric Search results show ones
Crafts > Knitting and to make and ones to buy.
Crochet > Crochet >
Crochet for Home > Crochet
Blanket

Home Decor > Room Decor


> Bedroom > Bed Linens >
Blanket

Query alone in street Art > Photography This query is more


ambiguous, but search
tells us it has an artistic
element to the query, so
selecting L1 Art is ok here.
Google takes it a step
further and specifies
Photography.

Query afternoon tea london Travel > Restaurant This query is about finding
best the best place in London
Travel > Travel Destinations for afternoon tea. They
> Europe Destinations > UK want to be in London, and
and IE Destinations > UK they want a tea parlor.
Destinations > London

Query yellow ico Art > Illustration > Icon Search results show this is
Illustration most likely a misspelling of
“yellow icon.” We see one
Design > Web and App result of a game character
Design > App Icon Design with a similar name, but
overall, this query seems
to be looking for icons.

Query cristiano ronaldo sem Sport > Soccer > Soccer seeking images of the
camisa Players > Cristiano Ronaldo soccer player, shirtless.

Entertainment > Celebrities >


Celebrity Photos

Query maternity photography Art > Photography > This query is not related to
race car Photography Subjects > maternity health or to
People Photography > Motor Sports racing.
Family Photography > They’re either looking for
Maternity Photography ideas for a maternity shoot,
or want to look at photos
with a maternity theme.

Query new puppy Event Planning > Personal The intent of this query is
announcement Celebration not to look at images of
dogs.

Query boho cale Food and Drinks > Desserts This query is most likely a
> Cake > Cake Design misspelling of “cake.” This
is a query looking for the
dessert. Do not assume
Events Planning or any
other node.

Query etsy earrings Women’s Fashion > Search results show etsy
Women’s Jewelry and earrings should be
Accessories > Women’s handcrafted, but it’s not
Jewelry > Ear Jewelry clearly DIY, so stop at DIY
and Crafts instead of going
DIY and Crafts deeper to DIY Jewelry.

Query 1st birthday party DIY & Crafts > DIY Event > Google shows ideas for
DIY Birthday throwing a birthday party
for a 1 yo ranging from
Event Planning > Hosting buying supplies to creating
Occasions > Kids’ Party decorations.

Event Planning > Personal


Celebration > Birthday > 1st
Birthday

Query how to grow cilantro Gardening > Planting > This fits the Gardening
Plants > Planting Ideas vertical since the intent is
to grow it.

Food and Drinks > It doesn’t make sense in


Condiments > Herbs > Food and Drinks even
Cilantro though you’ll find “Cilantro”
there, because it’s not
about using it in a recipe,
so it doesn’t fit under Food
and Drinks.

Query womens skirt pattern Women’s Fashion > User is looking for a
Women’s Bottoms > Skirts pattern to make a skirt.
They want a pattern
DIY and Crafts > Fabric because they want a skirt.
Crafts > Sewing > Sewing Choose both concepts.
Patterns

DIY and Crafts > Fabric


Crafts > Clothes > Clothes
Pattern

Query 49ers sweatshirt Women’s Fashion > Results show men’s and
Women’s Top > Sweatshirt women’s, so include both
paths.
Men’s Fashion > Men's
Shirts and Top Same as with the jawa
poster, they’re after an
item, a sweatshirt, so we
don’t select NFL.

Query repurpose mens ties DIY and Crafts > Fabric Results are about using
Crafts men’s ties for reasons
other than men’s fashion,
turning them into
something else. The intent
is fabric craft.

Men's Fashion > Men's Although they’re using


Accessories > Men's Tie men’s ties, the intent isn’t
about Men’s Fashion, so it
is not selected.

Query snoopy Entertainment For a node that has an


incorrect L1 Vertical, the
best action for labelers is
to select the best L1. The
Food and Drinks > Nuts > correct L1 here is
Peanuts > Snoopy Entertainment, not Food
and Drinks.

Query penne alla vodka Food and Drinks > World Here, Vodka is a main
Cuisine > Italian Recipes > ingredient, although it is
Pasta not used as a Drink. It is
still OK to select Pasta and
Food and Drinks > Drinks > Vodka, as the intent is
Alcoholic Drinks > Liquor > about the L1 Food & Drink
Vodka and both Pasta and Vodka
nodes are in this vertical.

Query valentines day design Event Planning > Holiday > This query is about graphic
graphic Valentine’s Day design for Valentine’s Day.

Design > Graphic Design

1.2 Ambiguous examples


In the exemples below select “The entity is too ambiguous.”

Query crushi no clear, repeated


possibilities in results.

Query spending time maybe the song, or a


book, or general way of
life…

Query black full set nails? bedding? key caps?


unclear.

Query maladive misspelling of Maldives?


‘sickly’ in French,
uncommon word for sickly
in English. Unclear.

A note on Interior Design vs Home Decor:


● An Interior Designer works with architects to improve a space’s functionality, and with the
client to make the space work for the client. They work with changing the
rooms/building’s structure, not just the contents, and they also decorate.
● A Decorator works with the existing structure.
2.0 L1 Verticals

Below are the different verticals or L1 Categories of the taxonomy.

Animals
Architecture
Art
Beauty
Children's Fashion
DIY & Crafts
Design
Education
Electronics
Entertainment
Event Planning
Finance
Food and Drinks
Gardening
Health
Home Decor
Men's Fashion
Parenting
Quotes
Sport
Travel
Vehicles
Wedding
Women's Fashion

You might also like