Professional Documents
Culture Documents
Session 6: Structure of The Web: Hubs & Authorities, Pagerank (Ch. 13 and 14)
Session 6: Structure of The Web: Hubs & Authorities, Pagerank (Ch. 13 and 14)
Authorities, PageRank
(ch. 13 and 14)
1
2
3
4
5
6
Broder et al. (2000)
7
8
LINK ANALYSIS AND WEB SEARCH
9
Based on Kleinberg (1998)
10
Hub and Authority scores: Initial in-link counts
11
Hub and Authority scores: Assign authority scores back to hubs
12
Hub and Authority scores: Repeat step 1, calculate new authority scores
13
Normalize by dividing authority scores by the total over the whole network
Can also normalize the hub scores
14
See EK ch. 14 for proof that scores
converge to a limit after many
repeated iterations to update the
hub and authority scores.
15
Based on Page, Brin, Motwani and Winograd (1998)
PAGE RANK
16
PageRank
http://ilpubs.stanford.edu:8090/422/
http://infolab.stanford.edu/~backrub/google.html
Step A B C D E F G H
0 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8
1 4/8 1/16 1/16 1/16 1/16 1/16 1/16 1/8
2 5/16 2/8 2/8 1/32 1/32 1/32 1/32 1/16
3 5/32 5/32 5/32 1/8 1/8 1/8 1/8 1/32 17
Equilibrium values: Limit values of PageRank iterations
18
Need for damping PageRank (Scaling-down the PageRank scores)
Trapping effect of loops– they will siphon away the PageRank scores
eventually
Scaled PageRank Update: First apply the Basic PageRank Update Rule. Then
scale down all PageRank values by a factor s, between 0 and 1 (typically
between 0.8 and 0.9). Divide the residual 1 – s units of PageRank, equally over
all nodes, giving (1-s)/n to each.
19
20
21
Adjacency matrix: Mij
The entry Mij is equal to 1 if there is a link from node i to node j, and Mij
= 0 otherwise.
22
Update rule for hubs: Use adjacency matrix M.
By representing the link structure using an adjacency matrix, the Hub and Authority
Update Rules become matrix-vector multiplication.
23
To update authorities, the scores flow in the opposite
direction. So, use Mji instead of Mij, which forms MT.
Update authority ai to be the sum of hj over all nodes j that have an edge to i.
24
PageRank: Use Nji, which forms NT.
Update PageRank score ri to be the sum of rj over all nodes j that have an edge to i. Note that the matrix structure is
similar to the authority update, except in PageRank Nji are fractions representing “portions of flow” from node j to i.
25
Scale-down (damping) PageRank, for the problem of loops
s=0.8