Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

“PAGE RANK OF GOOGLE SEARCH: THE ALGORITHM THAT ORGANIZES THE WEB”

By: Peter Dadis Breboneria II


(Formerly Peter Reganit Breboneria II)

Faculty of Education
University of the Philippines-Open University
Managed by: Tomas B. Cabagay Jr.
PageRank at Google Search is correlated to increased revenues in the internet
marketing business of Utak Henyo, a social enterprise company that my marketing team
just started recently. I need to be involved in social enterprise, a business model for charity
or humanitarian purposes to ensure sustainability of the youth development programs at
International Center for Youth Development (ICYD), a direct partner of the Department of
Education (DepEd) and non-profit corporation that I founded and led since 2008.

The early World Wide Web (WWW) was on chaos in the 1990s that led Page and
Brin to create PageRank algorithm to organize and make sense the vastness of the oceanic
web leading them to write the mission statement of Google: “to organize the world's
information and make it universally accessible and useful”. The first algorithm used in
Google search intends to rank globally all web pages based from the relative importance
that appears in the search engine results. The PageRank is named after Google Founder
Lawrence Edward Page who received a computer engineering degree at University of
Michigan in 1995. He also finished doctoral degree at Stanford University (1998) where he
met Sergey Brin, his co-founder at Google. Page and Brin were interested on putting order
to the gigantic mass of data, and heterogeneity of the web and maximizing the use of its link
structure and text. Page and Brin studied related work of Pitkow[Pit97] who had a thesis
on “World Wide Web Ecologies”, Weiss [WVS+ 96] who discussed about clustering
methods that gives importance to link structure, Kleinberg[Kle98] who developed
insightful model of the web and Spertus [Spe97] discussed various applications of link
structures.

PageRank is one of algorithms used in the search engine, Google. Larry Page
described the perfect search engine as “understanding exactly what you mean and giving
you back exactly what you want”.

THE BASIC OF GOOGLE SEARCH

To materialize this project, Current Google follows three(3) steps such as “crawling”,
“indexing”, and “serving and ranking.”

Step 1: Crawling
Crawling is the process by which Googlebot, an artificial intelligence program who
crawls like a spider navigating and documenting new web page to be added to the Google
Index, a giant database system stored in huge computers. The program decided which
pages or sites to visit and archive, and how frequent. Googlebot was also named spider
probably because the term “web” of the World Wide Web(WWW) created by Tim Berners-
Lee at CERN- Switzerland in 1990 can be analogous or compared to a humungous spider
web. Spider documents “new web page URLs” and sitemap data from webmasters. It
navigates possible new links and ignores duplicate information. It determines pages
blocked in robot.txt but can be recorded if connected to another page or site. It cannot
crawl on pages inaccessible to anonymous users.

Step 2: Indexing
Indexing is the process of understanding the content of the page and site after being
discovered by spider. It analyzes the content implementing lexical analysis, categorizing
videos and images or pictures, and storing or adding it to Google Search Index, the colossal
server that controls hundreds of billion web pages and more than 100,000,000 gigabytes in
sizes. Indexing is comparable to index at the back of book in the University library
comprises of lists of words. The spider can process multiple contents except for rich media
files. It does not limit itself to word analysis but also use other useful information such as
locations and interests of the web surfers. Nowadays Google search does not just facilitate
on matching keywords from Google search entries but also help access millions of books
from major libraries and public data from institutions like World Bank.

Step 3: Serving and Ranking


When a web surfer enters a key word on the Google search, the spider sent signals
to the learned machine and returned results based from relevancy. Matt Cutts, an engineer
quality at Google stated that their system used 200 factors or criteria to ensure that
information sent back was germane to the user needs. Some of the factors were “words of
your query”, relevance and usability of pages”, “expertise of sources”, and “your location
and settings”. Google also monitors the user experience, the speed of the web pages and its
user friendliness.

a. Meaning of your query

What information you are looking for? What is the intention behind the
query? Is it a specific or broad search?

Google developed programs based from the latest research on Natural Language
Understanding (NLU) that can address this concern, interpret spelling mistakes, and
classify various questions. They created synonym programs that matches similar meanings
and took them 5 years to finish making 30% improvement results using various languages.
To address necessity of fresh information, Google also created freshness algorithms
that interprets latest information and trends like for example, PBA scores.

b. Relevance of webpages

What is the relevant information to your query?

Google created search algorithm that detect quantifiable signals to estimate


relevance. The basic indication of relevance is “when a page contains the same
keywords as your search query.” The limitation of the program is handling abstract
views, for example complex political or religious views.

c. Quality of content

How trustworthy the information is?


The search algorithm has the capacity to determine if the data demonstrate
“expertise, authoritativeness, and trustworthiness”. Google also used spam
algorithms to evaluate low quality pages and ensure that this links will not appear in
search results.

d. Usability of webpages

Is the page adjustable to various devices such as mobile, desktop,


tablets? Is it viewable in a slow internet connection?

Google also created algorithms to evaluate if a page is user friendly or not. In


January 2018, their program considers the speed of the page.

e. Context and settings

What is the most relevant and useful at the moment?

Google considers location and past search history to decide what results to
produce best results.

PAGE RANK: THE FIRST ALGORITHM

I cited couple of algorithms such as search, synonyms, spams, and PageRank.


PageRank is one of the algorithms and the FIRST. As I mentioned earlier, the dark ocean of
the web led the creation of Page Rank. The original agenda of Page Rank “was a way to sort
backlinks so if there were a large number of backlinks for a document, the ‘best’ backlinks
could be displayed first”. Every time a professor asks the students to write a paper, he asks
everyone to cite the references. Every link in a web is like academic citations or references.
For certain, the page of Google has massive backlinks pointing to it. In Figure 1, web page A
& B are backward links of Webpage C or forward links pointing to C.
The PageRank is described as “A page has a high rank if the sum of the ranks of its
backlinks is high. This covers both the case when a page has many backlinks and when a
page has a few highly ranked backlinks.”

The PageRank is defined as “Let u be a web page. Then let Fu be the set of pages u
points to and Bu be the set of pages that point to u. Let Nu = [Fu] be the number of links
from u and let c be a factor used for normalization (so that the total rank of all web pages is
constant). We begin by defining a simple ranking, R which is a slightly simplified version of
PageRank:

Let me show you how to calculate Page Rank using Power Method. If the only links
in the system were from pages A, B, and D to C, each link would transfer 0.25. The
numerator of the each iteration refers to the value of the page pointing (inward links) to a
targeted page divided by the number of outgoing links. For example in page A below on
iteration 1, The value of page C is 0.25(inward link) divided by 3 outgoing links.

Pages Iteration 0 Iteration 1 Iteration 2 Ranking


A 0.25 0.25(C)/3 0.375(C)/3 4
=0.083 =0.125
B 0.25 0.25(A)/2 + 0.083 (A)/2 + 3
0.25 (C)/3 = 0.375 (C)/3
0.125 + 0.083 =0.0415 +
=0.2083 0.125= 0.1665
C 0.25 0.25(A)/2 + 0.083(A)/2 + 1
0.25(D)/1 0.333 (D)/1 =
=0.125 + 0.25 0.0415
=0.375 +0.333=0.3745
D 0.25 0.25 (B)/1 + 0.2083 (B)/1 + 2
0.25(C) /3= 0.375 (C)/ 3
0.25 +0.083= =0.2083
0.333 +0.125= 0.3333
Total 1 1 1

Figure 2 demonstrates the propagation of rank from one pair of pages to another”.

Figure 3 shows a consistent steady state solution for a set of pages.


The simplified versions of Page Rank encountered issue such as Rank Sink and
Dangling Links. The Google team developed the Rank source model to overcome rank
sink(See figure 4) . They described Rank Sink as,

“Consider two web pages that point to each other but to no other page. And suppose
there is some web page which points to one of them. Then, during iteration, this loop will
accumulate rank but never distribute any rank (since there are no outedges)”.

To overcome Rank Sink, they developed the formula for Rank Source:
Dangling links are defined as “simply links that point to any page with no outgoing
link…. Because dangling links do not affect the ranking of any other page directly, we
simply remove them from the system until all the PageRanks are calculated. After all the
PageRanks are calculated, they can be added back in, without affecting things significantly.”

To implement Page Rank, they had built a “complete crawling and indexing system
(remember the index at the back of the books) that had a repository of 24 million pages in
1998”
The original applications of the Page rank is not just to give ranking of all pages but
enable us to search high quality results and essentials, estimate traffic, and assist users on
deciding which links in a long list are "more likely to be interesting".

References

1. Page, Lawrence, Brin, Sergey, Motwani, Rajeev, Winograd, and Terry. “The PageRank
Citation Ranking: Bringing Order to the Web.” Stanford InfoLab Publication Server.
Stanford InfoLab, November 11, 1999. http://ilpubs.stanford.edu:8090/422/.

2. https://www.youtube.com/watch?v=BNHR6IQJGZs&t=22“How Google Search Works -


Search Console Help.” Google. Google. Accessed June 12, 2020.
https://support.google.com/webmasters/answer/70897.

3. Cabagay, Tomas B. “Computations by Early Civilizations and History of Computers.”


Essay. In Introduction to Basic Computing. Laguna, Ph: UP Open University, n.d.

You might also like