Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Google PageRank

Beat Signer
Global Information Systems Research Group
Department of Computer Science
ETH Zurich

Vrije Universiteit Brussel, August 25, 2008


Overview
▪ History of PageRank
▪ PageRank algorithm
▪ Examples
▪ Implications for website development

Vrije Universiteit Brussel, August 25, 2008 Beat Signer, signer@inf.ethz.ch 2


History of PageRank
▪ Developed as part of an academic Larry Page Sergey Brin

project at Stanford University


▪ research platform to aid understanding of large-scale
web data and enable researches to easily experiment
with new search technologies
▪ Larry Page and Sergey Brin worked on the project
about a new kind of search engine (1995-1998) which
finally led to a functional prototype called Google

Vrije Universiteit Brussel, August 25, 2008 Beat Signer, signer@inf.ethz.ch 3


Web Search Until 1998
▪ Find all documents using a query term
▪ use information retrieval (IR) solutions
▪ ranking based on "on the page factors"
→ problem: poor quality of search results (order)
▪ Page and Brin proposed to compute the
absolute qualtity of a page (PageRank)
▪ based on the number and quality of pages
linking to a page (votes)

Vrije Universiteit Brussel, August 25, 2008 Beat Signer, signer@inf.ethz.ch 4


PageRank P1 P5 P7 P8
R1 R5 R7 R8

P2 P3 P4 P6
R2 R3 R4 R6

▪ A page has a high PageRank R if


▪ there are many pages linking to it
▪ or, if there are some pages with a high PageRank
linking to it
▪ Total score = IR score x PageRank
Vrije Universiteit Brussel, August 25, 2008 Beat Signer, signer@inf.ethz.ch 5
PageRank Algorithm
R ( Pj )
R ( Pi ) = 
Pj Bi Lj
P1 P2

1.5
1 1.5
1

▪ where
▪ Bi is the set of pages
that link to page Pi P3

▪ Lj is the number of 0.75


1
outgoing links for page Pj

Vrije Universiteit Brussel, August 25, 2008 Beat Signer, signer@inf.ethz.ch 6


Matrix Representation
▪ Let us define a hyperlink P1 P2
matrix H
1 L j if Pj  Bi
H ij = 
 0 otherwise
0 1 2 1 
and R = R(Pi )
P3
H = 1 0 0
→ R = HR 0 1 2 0

R is an eigenvector of H
with eigenvalue 1
Vrije Universiteit Brussel, August 25, 2008 Beat Signer, signer@inf.ethz.ch 7
Matrix Representation ...
▪ We can use the power method to find R
t +1
R = HR t

0 1 2 1 

For our example H = 1 0 0 
 
0 1 2 0

this results in R = 2 2 1 or 0.4 0.4 0.2

Vrije Universiteit Brussel, August 25, 2008 Beat Signer, signer@inf.ethz.ch 8


Dangling Pages
▪ Problem with pages P1 P2
C
that have no outbound
links (P2)

0 0 C

H=  and R = 0 0
1 0
0 1 2 0 1 2
C=  and S = H + C =  
0 1 2 1 1 2
Vrije Universiteit Brussel, August 25, 2008 Beat Signer, signer@inf.ethz.ch 9
Strongly Connected Pages (Graph)
1-d
▪ Add new transition P1 P2
probabilities between
all pages
▪ with probability d we follow
P4 P3
the hyperlink structure S
▪ with probability 1-d we
choose a random page
P5
1-d 1-d
G = (1 − d ) 1 + dS
1
R = GR
n
Vrije Universiteit Brussel, August 25, 2008 Beat Signer, signer@inf.ethz.ch 10
G = (1 − d ) 1 + dS
1
Examples n

A2

0.37

A1 A3

0.26 0.37

Vrije Universiteit Brussel, August 25, 2008 Beat Signer, signer@inf.ethz.ch 11


G = (1 − d ) 1 + dS
1
Examples ... n

A2 B2

0.185 0.185

A1 A3 B1 B3

0.13 0.185 0.13 0.185

P( A) = 0.5 P(B ) = 0.5

Vrije Universiteit Brussel, August 25, 2008 Beat Signer, signer@inf.ethz.ch 12


G = (1 − d ) 1 + dS
1
Examples ... n

A2 B2

0.14 0.20

A1 A3 B1 B3

0.10 0.14 0.22 0.20

P( A) = 0.38 P(B ) = 0.62

Vrije Universiteit Brussel, August 25, 2008 Beat Signer, signer@inf.ethz.ch 13


G = (1 − d ) 1 + dS
1
Examples ... n

A2 B2

0.23 0.095

A1 A3 B1 B3

0.3 0.18 0.10 0.095

P( A) = 0.71 P(B ) = 0.29

Vrije Universiteit Brussel, August 25, 2008 Beat Signer, signer@inf.ethz.ch 14


G = (1 − d ) 1 + dS
1
Examples ... n

A2 B2

0.24 0.07

A1 A3 B1 B3

0.35 0.18 0.09 0.07

P( A) = 0.77 P(B ) = 0.23

Vrije Universiteit Brussel, August 25, 2008 Beat Signer, signer@inf.ethz.ch 15


G = (1 − d ) 1 + dS
1
Examples ... n

A2 B2

0.17 0.06

A1 A3 B1 B3

0.33 0.175 0.08 0.06

A4 P(B ) = 0.14

P( A) = 0.86
0.125

Vrije Universiteit Brussel, August 25, 2008 Beat Signer, signer@inf.ethz.ch 16


Implications for Website Development
▪ First make sure that your page gets indexed
▪ "on the page factors"
▪ Think about your site's internal link structure
▪ create many internal links for important pages
▪ be "careful" about where to put outgoing links
▪ Increase the number of pages
▪ Ensure that webpages are addressed consistently
▪ http://www.vub.ac.be  http://www.vub.ac.be/index.php
▪ Make sure that you get links from good websites
Vrije Universiteit Brussel, August 25, 2008 Beat Signer, signer@inf.ethz.ch 17
Consistent Addressing of Webpages

Vrije Universiteit Brussel, August 25, 2008 Beat Signer, signer@inf.ethz.ch 18


Other Search Engine Optimisations (SEO)
▪ Internet marketing has become big business
▪ white hat and black hat optimisations
▪ Black hat optimisations
▪ link farms
▪ spamdexing in guestbooks etc.
<a rel="nofollow" href="…">…</a>
▪ selling/buying links
▪ …

▪ Is PageRank fair?
Vrije Universiteit Brussel, August 25, 2008 Beat Signer, signer@inf.ethz.ch 19
Conclusions
▪ PageRank algorithm
▪ absolute quality of a page based on incoming links
▪ random surfer model
▪ computed as eigenvector of Google matrix G
▪ Implications for website development and SEO
▪ PageRank is just one (important) factor

Vrije Universiteit Brussel, August 25, 2008 Beat Signer, signer@inf.ethz.ch 20


References
▪ The PageRank Citation Ranking: Bringing Order
to the Web, L. Page, S. Brin, R. Motwani and
T. Winograd, January 1998
▪ The Anatomy of a Large-Scale Hypertextual
Web Search Engine, S. Brin and L. Page,
Computer Networks and ISDN Systems, 30(1-7),
April 1998

Vrije Universiteit Brussel, August 25, 2008 Beat Signer, signer@inf.ethz.ch


References …
▪ PageRank Uncovered, C. Ridings and
M. Shishigin, September 2002
▪ PageRank Calculator,
http://www.webworkshop.net/pagerank_
calculator.php

Vrije Universiteit Brussel, August 25, 2008 Beat Signer, signer@inf.ethz.ch 22

You might also like