Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Googlebot

Googlebot is the web crawler software used by Google that


collects documents from the web to build a searchable index for
Googlebot
the Google Search engine. This name is actually used to refer to
two different types of web crawlers: a desktop crawler (to simulate
desktop users) and a mobile crawler (to simulate a mobile user).[1] Original author(s) Google
Type Web crawler
Behavior Website Googlebot
FAQ (https://de
A website will probably be crawled by both Googlebot Desktop
velopers.googl
and Googlebot Mobile. However Google announced that, starting
from September 2020, all sites were switched to mobile-first e.com/search/
indexing, meaning Google is crawling the web using a smartphone docs/advance
Googlebot.[2] The subtype of Googlebot can be identified by d/crawling/goo
looking at the user agent string in the request. However, both glebot)
crawler types obey the same product token (useent token) in
robots.txt, and so a developer cannot selectively target either Googlebot mobile or Googlebot desktop using
robots.txt.

If a webmaster chooses to restrict the information on their site available to a Googlebot, or another spider,
they can do so with the appropriate directives in a robots.txt file,[3] or by adding the meta tag <meta
name="Googlebot" content="nofollow" /> to the web page.[4] Googlebot requests to
Web servers are identifiable by a user-agent string containing "Googlebot" and a host address containing
"googlebot.com".[5]

Currently, Googlebot follows HREF links and SRC links.[3] There is increasing evidence Googlebot can
execute JavaScript and parse content generated by Ajax calls as well.[6] There are many theories regarding
how advanced Googlebot's ability is to process JavaScript, with opinions ranging from minimal ability
derived from custom interpreters.[7] Currently, Googlebot uses a web rendering service (WRS) that is based
on the Chromium rendering engine (version 74 as on 7 May 2019).[8] Googlebot discovers pages by
harvesting every link on every page that it can find. Unless prohibited by a nofollow-tag, it then follows
these links to other web pages. New web pages must be linked to from other known pages on the web in
order to be crawled and indexed, or manually submitted by the webmaster.

A problem that webmasters with low-bandwidth Web hosting plans have often noted with the Googlebot is
that it takes up an enormous amount of bandwidth. This can cause websites to exceed their bandwidth limit
and be taken down temporarily. This is especially troublesome for mirror sites which host many gigabytes
of data. Google provides "Search Console" that allow website owners to throttle the crawl rate.[9]

How often Googlebot will crawl a site depends on the crawl budget. Crawl budget is an estimation of how
typically a website is updated. Technically, Googlebot's development team (Crawling and Indexing team)
uses several defined terms internally to take over what "crawl budget" stands for.[10] Since May 2019,
Googlebot uses the latest Chromium rendering engine, which supports ECMAScript 6 features. This will
make the bot a bit more "evergreen" and ensure that it is not relying on an outdated rendering engine
compared to browser capabilities.[8]

Mediabot
Mediabot is the web crawler that Google uses for analyzing the content so Google AdSense can serve
contextually relevant advertising to a web page. Mediabot identifies itself with the user agent string
"Mediapartners-Google/2.1".

Unlike other crawlers, Mediabot does not follow links to discover new crawlable URLs, instead only
visiting URLs that have included the AdSense code.[11] Where that content resides behind a login, the
crawler can be given a log in so that it is able to crawl protected content.[12]

InspectionTool Crawlers
InspectionTool is the crawler used by Search testing tools such as the Rich Result Test and URL inspection
in Google Search Console. Apart from the user agent and user agent token, it mimics Googlebot.[13]

A guide to the crawlers was independently published. [14] It details four (4) distinctive crawler agents based
on Web server directory index data - one (1) non-chrome and three (3) chrome crawlers.

References
1. "Googlebot" (https://support.google.com/webmasters/answer/182072?hl=en). Google. 2019-
03-11. Retrieved 2019-03-11.
2. "Announcing mobile first indexing for the whole web" (https://developers.google.com/search/
blog/2020/03/announcing-mobile-first-indexing-for). Google Developers. Retrieved
2021-03-17.
3. "Google Search Console" (https://search.google.com/search-console/about). Google.com.
4. "Google Search Console" (https://search.google.com/search-console/about).
search.google.com. Retrieved 2019-03-11.
5. "What is Googlebot | Google Search Central | Documentation" (https://developers.google.co
m/search/docs/advanced/crawling/googlebot). May 2022.
6. "Understand the JavaScript SEO basics | Search for Developers" (https://developers.google.
com/search/docs/guides/javascript-seo-basics). Google Developers. Retrieved 2020-07-26.
7. Splitt, Martin. "How Google Search indexes JavaScript sites - JavaScript SEO" (https://www.
youtube.com/watch?v=LXF8bM4g-J4). YouTube. Archived (https://ghostarchive.org/varchiv
e/youtube/20211212/LXF8bM4g-J4) from the original on 2021-12-12.
8. "The new evergreen Googlebot" (https://webmasters.googleblog.com/2019/05/the-new-ever
green-googlebot.html). Official Google Webmaster Central Blog. Retrieved 2019-06-07.
9. "Google - Webmasters" (https://www.google.com/webmasters/). Retrieved 2012-12-15.
10. "What Crawl Budget Means for Googlebot" (https://webmasters.googleblog.com/2017/01/wh
at-crawl-budget-means-for-googlebot.html). Official Google Webmaster Central Blog.
Retrieved 2018-07-04.
11. "About the AdSense Crawler" (https://support.google.com/adsense/answer/99376?hl=en&re
f_topic=1348129).
12. "Display ads on login-protected pages" (https://support.google.com/adsense/answer/16135
1).
13. "Google Crawler (User Agent) Overview" (https://developers.google.com/search/docs/crawli
ng-indexing/overview-google-crawlers).
14. "The Ultimate Guide to the New InspectionTool Crawlers" (https://strategicmarketinghouse.c
om/the-ultimate-guide-to-the-new-inspectiontool-crawlers/).

External links
Google's official Googlebot FAQ (https://developers.google.com/search/docs/advanced/craw
ling/googlebot)

Retrieved from "https://en.wikipedia.org/w/index.php?title=Googlebot&oldid=1165504046"

You might also like