Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

BING – JUDGING CONTENT CHANGE

GUIDELINE 1.0

CAUTION – THIS TASK MAY CONTAIN ADULT


CONTENT. OPEN THIS TASK AS APPROPRIATE

THIS DOCUMENT CONTAINS CONFIDENTIAL AND PROPRIETARY INFORMATION


BELONGING TO MICROSOFT CORPORATION.

THE RECIPIENT UNDERSTANDS AND AGREES THAT THESE MATERIALS AND THE
INFORMATION CONTAINED HEREIN MAY NOT BE USED OR DISCLOSED WITHOUT THE
PRIOR WRITTEN CONSENT OF MICROSOFT CORPORATION.

PLEASE TRY TO USE OTHER BROWSERS SUCH AS MICROSOFT OR EDGE IF IT KEPT LET
YOU DOWNLOADING FILES AND NOT SHOW THE WEBPAGES IN THE HIT, MIGHT BE A
COMPATIBILITY PROBLEM WITH BROWSER!!!

CONTENTS

Content Change ..................................................................................................................... 2


Undertsanding the concept of main body ............................................................................... 3
Judging content change ......................................................................................................... 5
1. Search result pages or listing web pages ...................................................................................5
2. Other web pages (Not Search result pages, not listing pages) ....................................................8
2.1 Main content is the same ........................................................................................................... 8
2.2 Main content is different .......................................................................................................... 12
3. Unknown; Cannot judge content differences .......................................................................... 16
CONTENT CHANGE

Content Change (CC) is a measure of usefulness of content change for a web page for Search
Engines as Bing.

By putting yourself in the shoes of a web site owner, you can understand why detecting a
Content Change is important for search engines as search engine crawlers should avoid
repeating crawling a web page if the content for this web page didn’t not change since the last
time the crawler visited this web page to minimize crawling cost for web site owner.

In addition, by putting yourself in the shoes of a search user, you can see why detecting a
Content Change is important for Search Engines as search users are expecting to always find the
latest content information in search results pages or linked from search results including for
web pages changing on the web.

Figure 1 : Today, Bing Search Results links includes the name of the latest president “Trump” for this web page, before it was “Obama”

Careful judgment is required to evaluate the Content Change of the page from last time the
web page was crawler to the time the web page is crawler. Just because a page may be visually
the same doesn’t mean that it does not include useful content change for Bing. Similarly, a page
that may not look the same, does not mean that the content change is useful for Bing.
UNDERTSANDING THE CONCEPT OF MAIN BODY

The Main Body represents the dominant content of a web page. The main content area consists
of content that is directly related to or expands upon the central topic of a document. It does
not include advertising, footer, navigation elements. Content changes in the main body part
are generally very important/.

Header

Navigation horizontal bar

Navigation Main body Aside


vertical bar Sidebar

Navigation

Footer – Institutionally

Typically, layout of a web page


Main body
section for this
web page
JUDGING CONTENT CHANGE

Judging content change involves detecting uselessness of the changes between two copies of
the same web page crawled at two different dates.

Detecting content change can usually takes significant time, effort, expertise. Also, we
automated classification for many cases to only present to judges cases that we cannot
automatically classified.

For instance, if the HTML outputted is the same since last time the crawler, we will assume that
the content displayed to the user is the same.

For instance, if some key HTML tags (HTML <title> tag, HTML <meta description tag, HTML
<H1> tag, HTML meta Robots tag, HTML markup: Open graph, Schema.org, JSON, etc.) are
changed, we will automatically detect these cases and judges don’t have to judge them.

Judges are asked to classify content changes in one of these 4 categories

 Search Result pages or listing web pages

Other web pages (Not Search result pages, not listing pages)
 Main content is the same
 Main content is different

 Unknown; Cannot judge content differences

1. SEARCH RESULT PAGES OR LISTING WEB PAGES

Search engines may occasionally links to search results pages from web sites as Amazon search
results page https://www.amazon.com/socks/s?page=1&rh=i%3Aaps%2Ck%3Asocks. Content
in these pages keep changing based on product availability. Search engines must crawl these
pages regularly without crawling them too often.

For all these pages, we ask judges to not judge difference between content. Just all pages
looking like search results, listing pages in this category.
This is another form of Search Results pages, listing in this page the most recent horoscopes.
We called that Listing pages, and these pages fall into this category.

The page on the left include 6 houses and the left on the right includes 5 hours such page.
Another example of search result pages.

Another example of search result pages displayed as grid results page.


Another example of search result pages, listing a set of courses.

2. OTHER WEB PAGES (NOT SEARCH RESULT PAGES, NOT LISTING PAGES)

2.1 MAIN CONTENT IS THE SAME


In the example above, the main content is the same; this page must be classified as {main content is the
same}.

In his example above, the main content highlighted in green is the same, the advertisements are
different. This page must be classified as {main content is the same}.

In the example above, both web pages are dead links. This page must be classified as {main content is
the same}.
In the example above, the page includes the same article news content. Only the links at the bottom in
the section More from Polygon have changed. These links are not the part of the main body and are site
wide navigation links. This page must be classified as {main content is the same}.
In the example above, the page includes the same article news content, only the links at the bottom in
the sections Top Trending Terms and More Read have changed. These links are not the part of the main
body and are site wide navigation links. This page must be classified as {main content is the same}.

In the example above, the main content has some minor difference as the article includes the time since
publication. Such difference must be discarded as the time will be new for each crawl. This page must be
classified as {main content is the same}.

In the example above, one page includes a larger adverting; main content of the page is the same. Such
difference must be discarded. This page must be classified as {Main content is the same}.
In the example above, the main body include the same content and an advertisement which will change
when the page is loaded. All advertisements must be discarded from judgments. This page must be
classified as {Main content is the same}.

In the example above, one page includes a breaking new banner, main content of the page is the same.
Such difference is not related to the content and must be discarded. This page must be classified as
{main content is the same}.

2.2 MAIN CONTENT IS DIFFERENT


In the example above, the main body text has change, so this page must be classified as content
change.

In the example above, a valid page became a dead link, so this page must be classified as
content change.
In the example above, the page publication date has changed (was July 02, 2018 now July 5,
2018), so this page must be classified as content change.

In the example above, main content is nearly identically but this additional text “Follow city-
data.com founder on our Forum or @LechMazur” is new, so this page must be classified as
content change.
In the example above, a moderator is now needed, so this page must be classified as content
change.

In the example above, a main link has been changed in the main navigation, so this page must
be classified as content change.
3. UNKNOWN; CANNOT JUDGE CONTENT DIFFERENCES

In the example above, the tool was not able to access the content. This page must be classified
as UNKNOWN CANNOT JUDGE CONTENT DIFFERENCES.

You might also like