Professional Documents
Culture Documents
Google PDF
Google PDF
qB
p
r (q)
|q|
where
B
p
= {all pages pointing to p}
|q| = number of outgoing links on page q
Problems with solving
Large size of the system
Different solutions
Evgeny Dantsin Web-Search Ranking Algorithms
Raw Idea
System of linear equations
Rank of page p:
r (p) =
qB
p
r (q)
|q|
where
B
p
= {all pages pointing to p}
|q| = number of outgoing links on page q
Problems with solving
Large size of the system
Different solutions
Evgeny Dantsin Web-Search Ranking Algorithms
Random Surfer Model
Raw Version
Random walk:
A random surfer starts from a page chosen at random.
At each page, the surfer clicks on a link picked at random
and moves to the corresponding page.
Question: What is the probability that the surfer arrives at
page p at step t ?
Answer: The probability can be computed recursively
using
r
t
(p) =
qB
p
r
t 1
(q)
|q|
where
r
k
(p) denotes the probability of arriving at page p at step k;
the initial probabilities r
0
(p) are dened to be 1/n.
Evgeny Dantsin Web-Search Ranking Algorithms
Random Surfer Model
Raw Version
Random walk:
A random surfer starts from a page chosen at random.
At each page, the surfer clicks on a link picked at random
and moves to the corresponding page.
Question: What is the probability that the surfer arrives at
page p at step t ?
Answer: The probability can be computed recursively
using
r
t
(p) =
qB
p
r
t 1
(q)
|q|
where
r
k
(p) denotes the probability of arriving at page p at step k;
the initial probabilities r
0
(p) are dened to be 1/n.
Evgeny Dantsin Web-Search Ranking Algorithms
Random Surfer Model
Raw Version
Random walk:
A random surfer starts from a page chosen at random.
At each page, the surfer clicks on a link picked at random
and moves to the corresponding page.
Question: What is the probability that the surfer arrives at
page p at step t ?
Answer: The probability can be computed recursively
using
r
t
(p) =
qB
p
r
t 1
(q)
|q|
where
r
k
(p) denotes the probability of arriving at page p at step k;
the initial probabilities r
0
(p) are dened to be 1/n.
Evgeny Dantsin Web-Search Ranking Algorithms
Random Surfer Model
Raw Version
Random walk:
A random surfer starts from a page chosen at random.
At each page, the surfer clicks on a link picked at random
and moves to the corresponding page.
Question: What is the probability that the surfer arrives at
page p at step t ?
Answer: The probability can be computed recursively
using
r
t
(p) =
qB
p
r
t 1
(q)
|q|
where
r
k
(p) denotes the probability of arriving at page p at step k;
the initial probabilities r
0
(p) are dened to be 1/n.
Evgeny Dantsin Web-Search Ranking Algorithms
Random Walks, Markov Chains, and Ranking
Markov chains: The surfers random walk can be
equivalently described in terms of Markov chains.
Ranking: The ranks are stationary probabilities of the
surfers Markov chain
r (p) = lim
k
r
k
(p) = frequency of visiting page p
Problem: The proccess may not converge, stationary
probabilities may not exist!
Googles solution: Slight adjustment of the Markov chain
to guarantee the convergence.
Evgeny Dantsin Web-Search Ranking Algorithms
Random Walks, Markov Chains, and Ranking
Markov chains: The surfers random walk can be
equivalently described in terms of Markov chains.
Ranking: The ranks are stationary probabilities of the
surfers Markov chain
r (p) = lim
k
r
k
(p) = frequency of visiting page p
Problem: The proccess may not converge, stationary
probabilities may not exist!
Googles solution: Slight adjustment of the Markov chain
to guarantee the convergence.
Evgeny Dantsin Web-Search Ranking Algorithms
Random Walks, Markov Chains, and Ranking
Markov chains: The surfers random walk can be
equivalently described in terms of Markov chains.
Ranking: The ranks are stationary probabilities of the
surfers Markov chain
r (p) = lim
k
r
k
(p) = frequency of visiting page p
Problem: The proccess may not converge, stationary
probabilities may not exist!
Googles solution: Slight adjustment of the Markov chain
to guarantee the convergence.
Evgeny Dantsin Web-Search Ranking Algorithms
Random Walks, Markov Chains, and Ranking
Markov chains: The surfers random walk can be
equivalently described in terms of Markov chains.
Ranking: The ranks are stationary probabilities of the
surfers Markov chain
r (p) = lim
k
r
k
(p) = frequency of visiting page p
Problem: The proccess may not converge, stationary
probabilities may not exist!
Googles solution: Slight adjustment of the Markov chain
to guarantee the convergence.
Evgeny Dantsin Web-Search Ranking Algorithms
Adjusting the Random Surfer Model
Idea of adjustment: Every page must be reachable from
every other page (irreducibility).
Adjusted random walk:
A random surfer starts from a page chosen at random.
At each page, the surfer proceeds as follows:
1
With probability 1 , the surfer clicks on a link picked at
random and moves to the corresponding page.
2
With probability , the surfer gets bored and jumps to a page
chosen at random from all other pages.
The probability is called the jumping probability; its value
affects the rate of the convergence.
Adjusted recursive relation:
PR
t
(p) = (1 )
qB
p
PR
t 1
(q)
|q|
+
n
Evgeny Dantsin Web-Search Ranking Algorithms
Adjusting the Random Surfer Model
Idea of adjustment: Every page must be reachable from
every other page (irreducibility).
Adjusted random walk:
A random surfer starts from a page chosen at random.
At each page, the surfer proceeds as follows:
1
With probability 1 , the surfer clicks on a link picked at
random and moves to the corresponding page.
2
With probability , the surfer gets bored and jumps to a page
chosen at random from all other pages.
The probability is called the jumping probability; its value
affects the rate of the convergence.
Adjusted recursive relation:
PR
t
(p) = (1 )
qB
p
PR
t 1
(q)
|q|
+
n
Evgeny Dantsin Web-Search Ranking Algorithms
Computation Issues
Only few are mentioned here ...
Convergence and accuracy:
Trade-off when choosing : the smaller , the better the
adequacy of ranking, but the slower the convergence.
Googles original value was = 0.15 (convergence: about
100 iterations, accuracy: about 10
8
).
PageRank updating: Special algorithms are used to avoid
total recomputation of PageRank.
PageRank at Google Toolbar: Ranks are displayed using
a logarithmic scale.
Googles distribution: In the jumping term, Google uses
its own distribution instead the uniform one. The choice of
the jumping distribution affects neither mathematical nor
computational aspects, but it allows Google to (slightly)
manipulate ranks.
Evgeny Dantsin Web-Search Ranking Algorithms
Computation Issues
Only few are mentioned here ...
Convergence and accuracy:
Trade-off when choosing : the smaller , the better the
adequacy of ranking, but the slower the convergence.
Googles original value was = 0.15 (convergence: about
100 iterations, accuracy: about 10
8
).
PageRank updating: Special algorithms are used to avoid
total recomputation of PageRank.
PageRank at Google Toolbar: Ranks are displayed using
a logarithmic scale.
Googles distribution: In the jumping term, Google uses
its own distribution instead the uniform one. The choice of
the jumping distribution affects neither mathematical nor
computational aspects, but it allows Google to (slightly)
manipulate ranks.
Evgeny Dantsin Web-Search Ranking Algorithms
Computation Issues
Only few are mentioned here ...
Convergence and accuracy:
Trade-off when choosing : the smaller , the better the
adequacy of ranking, but the slower the convergence.
Googles original value was = 0.15 (convergence: about
100 iterations, accuracy: about 10
8
).
PageRank updating: Special algorithms are used to avoid
total recomputation of PageRank.
PageRank at Google Toolbar: Ranks are displayed using
a logarithmic scale.
Googles distribution: In the jumping term, Google uses
its own distribution instead the uniform one. The choice of
the jumping distribution affects neither mathematical nor
computational aspects, but it allows Google to (slightly)
manipulate ranks.
Evgeny Dantsin Web-Search Ranking Algorithms
Computation Issues
Only few are mentioned here ...
Convergence and accuracy:
Trade-off when choosing : the smaller , the better the
adequacy of ranking, but the slower the convergence.
Googles original value was = 0.15 (convergence: about
100 iterations, accuracy: about 10
8
).
PageRank updating: Special algorithms are used to avoid
total recomputation of PageRank.
PageRank at Google Toolbar: Ranks are displayed using
a logarithmic scale.
Googles distribution: In the jumping term, Google uses
its own distribution instead the uniform one. The choice of
the jumping distribution affects neither mathematical nor
computational aspects, but it allows Google to (slightly)
manipulate ranks.
Evgeny Dantsin Web-Search Ranking Algorithms
Strength or Weakness?
PageRank is query-independent.
Strength
Efciency. Ranks are computed in advance, not during
query processing.
Page owners have very little chance of affecting their
PageRanks scores.
Weakness
PageRank does not distinguish between pages that are
authoritative in general and pages that are authoritative on
the query topic.
Evgeny Dantsin Web-Search Ranking Algorithms
Strength or Weakness?
PageRank is query-independent.
Strength
Efciency. Ranks are computed in advance, not during
query processing.
Page owners have very little chance of affecting their
PageRanks scores.
Weakness
PageRank does not distinguish between pages that are
authoritative in general and pages that are authoritative on
the query topic.
Evgeny Dantsin Web-Search Ranking Algorithms
Strength or Weakness?
PageRank is query-independent.
Strength
Efciency. Ranks are computed in advance, not during
query processing.
Page owners have very little chance of affecting their
PageRanks scores.
Weakness
PageRank does not distinguish between pages that are
authoritative in general and pages that are authoritative on
the query topic.
Evgeny Dantsin Web-Search Ranking Algorithms
Idea of HITS
Observation. It is not necessary that an authoritative page
points to other authoritative pages. Instead, there are
special pages that act as hubs that contain collections of
links to authoritative pages.
Two weights. Every page has two weight:
The hub weight captures the quality of the page as a
pointer to useful resources.
The authority weight captures the quality of the page as a
resource itself.
Mutual reinforcing. A good hub points to good authorities,
while a good authority is pointed to by good hubs.
Evgeny Dantsin Web-Search Ranking Algorithms
Idea of HITS
Observation. It is not necessary that an authoritative page
points to other authoritative pages. Instead, there are
special pages that act as hubs that contain collections of
links to authoritative pages.
Two weights. Every page has two weight:
The hub weight captures the quality of the page as a
pointer to useful resources.
The authority weight captures the quality of the page as a
resource itself.
Mutual reinforcing. A good hub points to good authorities,
while a good authority is pointed to by good hubs.
Evgeny Dantsin Web-Search Ranking Algorithms
Idea of HITS
Observation. It is not necessary that an authoritative page
points to other authoritative pages. Instead, there are
special pages that act as hubs that contain collections of
links to authoritative pages.
Two weights. Every page has two weight:
The hub weight captures the quality of the page as a
pointer to useful resources.
The authority weight captures the quality of the page as a
resource itself.
Mutual reinforcing. A good hub points to good authorities,
while a good authority is pointed to by good hubs.
Evgeny Dantsin Web-Search Ranking Algorithms
Iterative Computation
1
Initial step. All hub and authority weights are set to 1:
h
0
(p) = a
0
(p) = 1 for all pages p
2
Updating. At each iteration, the weights are updated using
a
t
(p) =
qB
p
h
t 1
(q) and h
t
(p) =
qB
p
a
t
(q)
3
Normalization. After each iteration, a normalization
operation is applied so that the vectors of weights become
unit vectors.
4
Convergence. The algorithm iterates until the vectors
converge (the convergence has been proved).
Evgeny Dantsin Web-Search Ranking Algorithms
Iterative Computation
1
Initial step. All hub and authority weights are set to 1:
h
0
(p) = a
0
(p) = 1 for all pages p
2
Updating. At each iteration, the weights are updated using
a
t
(p) =
qB
p
h
t 1
(q) and h
t
(p) =
qB
p
a
t
(q)
3
Normalization. After each iteration, a normalization
operation is applied so that the vectors of weights become
unit vectors.
4
Convergence. The algorithm iterates until the vectors
converge (the convergence has been proved).
Evgeny Dantsin Web-Search Ranking Algorithms
Iterative Computation
1
Initial step. All hub and authority weights are set to 1:
h
0
(p) = a
0
(p) = 1 for all pages p
2
Updating. At each iteration, the weights are updated using
a
t
(p) =
qB
p
h
t 1
(q) and h
t
(p) =
qB
p
a
t
(q)
3
Normalization. After each iteration, a normalization
operation is applied so that the vectors of weights become
unit vectors.
4
Convergence. The algorithm iterates until the vectors
converge (the convergence has been proved).
Evgeny Dantsin Web-Search Ranking Algorithms
Iterative Computation
1
Initial step. All hub and authority weights are set to 1:
h
0
(p) = a
0
(p) = 1 for all pages p
2
Updating. At each iteration, the weights are updated using
a
t
(p) =
qB
p
h
t 1
(q) and h
t
(p) =
qB
p
a
t
(q)
3
Normalization. After each iteration, a normalization
operation is applied so that the vectors of weights become
unit vectors.
4
Convergence. The algorithm iterates until the vectors
converge (the convergence has been proved).
Evgeny Dantsin Web-Search Ranking Algorithms
More About HITS
Smaller graph. PageRank operates on the entire Web
graph. HITS operates on a smaller graph G that is built in
two steps:
1
Include all pages with the query key words into G.
2
Add all pages that either point to pages in G or are pointed
to by pages in G.
Query dependence. Unlike PageRank, HITS is
query-dependent.
HITS in search engines. HITS is used in Ask Jeeves
(www.ask.com).
Evgeny Dantsin Web-Search Ranking Algorithms
More About HITS
Smaller graph. PageRank operates on the entire Web
graph. HITS operates on a smaller graph G that is built in
two steps:
1
Include all pages with the query key words into G.
2
Add all pages that either point to pages in G or are pointed
to by pages in G.
Query dependence. Unlike PageRank, HITS is
query-dependent.
HITS in search engines. HITS is used in Ask Jeeves
(www.ask.com).
Evgeny Dantsin Web-Search Ranking Algorithms
More About HITS
Smaller graph. PageRank operates on the entire Web
graph. HITS operates on a smaller graph G that is built in
two steps:
1
Include all pages with the query key words into G.
2
Add all pages that either point to pages in G or are pointed
to by pages in G.
Query dependence. Unlike PageRank, HITS is
query-dependent.
HITS in search engines. HITS is used in Ask Jeeves
(www.ask.com).
Evgeny Dantsin Web-Search Ranking Algorithms
More About HITS
Smaller graph. PageRank operates on the entire Web
graph. HITS operates on a smaller graph G that is built in
two steps:
1
Include all pages with the query key words into G.
2
Add all pages that either point to pages in G or are pointed
to by pages in G.
Query dependence. Unlike PageRank, HITS is
query-dependent.
HITS in search engines. HITS is used in Ask Jeeves
(www.ask.com).
Evgeny Dantsin Web-Search Ranking Algorithms
Conclusions
HITS and PageRank are web-search ranking algorithms
that exploit math techniques (linear algebra, Markov
chains).
Skeptical about the role of theory in information
technology? Think about Googles success.
James Clerk Maxwell: There is nothing more practical
than a good theory.
Evgeny Dantsin Web-Search Ranking Algorithms
Conclusions
HITS and PageRank are web-search ranking algorithms
that exploit math techniques (linear algebra, Markov
chains).
Skeptical about the role of theory in information
technology? Think about Googles success.
James Clerk Maxwell: There is nothing more practical
than a good theory.
Evgeny Dantsin Web-Search Ranking Algorithms
Conclusions
HITS and PageRank are web-search ranking algorithms
that exploit math techniques (linear algebra, Markov
chains).
Skeptical about the role of theory in information
technology? Think about Googles success.
James Clerk Maxwell: There is nothing more practical
than a good theory.
Evgeny Dantsin Web-Search Ranking Algorithms