Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 27

Session 6: Structure of the Web: Hubs &

Authorities, PageRank
(ch. 13 and 14)

IDS 564, Prof. Ali Tafti

1
2
3
4
5
6
Broder et al. (2000)
7
8
LINK ANALYSIS AND WEB SEARCH

9
Based on Kleinberg (1998)

HUBS AND AUTHORITIES

10
Hub and Authority scores: Initial in-link counts

11
Hub and Authority scores: Assign authority scores back to hubs

12
Hub and Authority scores: Repeat step 1, calculate new authority scores

13
Normalize by dividing authority scores by the total over the whole network
Can also normalize the hub scores

14
See EK ch. 14 for proof that scores
converge to a limit after many
repeated iterations to update the
hub and authority scores.

15
Based on Page, Brin, Motwani and Winograd (1998)

PAGE RANK

16
PageRank
http://ilpubs.stanford.edu:8090/422/
http://infolab.stanford.edu/~backrub/google.html

Step A B C D E F G H
0 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8
1 4/8 1/16 1/16 1/16 1/16 1/16 1/16 1/8
2 5/16 2/8 2/8 1/32 1/32 1/32 1/32 1/16
3 5/32 5/32 5/32 1/8 1/8 1/8 1/8 1/32 17
Equilibrium values: Limit values of PageRank iterations

Using PageRank for search (Brin and Page 1998):


1) Retrieve all documents that contain the list of keywords
2) Sort them by PageRank

18
Need for damping PageRank (Scaling-down the PageRank scores)

Trapping effect of loops– they will siphon away the PageRank scores
eventually

Scaled PageRank Update: First apply the Basic PageRank Update Rule. Then
scale down all PageRank values by a factor s, between 0 and 1 (typically
between 0.8 and 0.9). Divide the residual 1 – s units of PageRank, equally over
all nodes, giving (1-s)/n to each.

19
20
21
Adjacency matrix: Mij
The entry Mij is equal to 1 if there is a link from node i to node j, and Mij
= 0 otherwise.

22
Update rule for hubs: Use adjacency matrix M.
By representing the link structure using an adjacency matrix, the Hub and Authority
Update Rules become matrix-vector multiplication.

Update rule for hubs: hi  Mi1 a1 + Mi2a2 + … + Min an


Rewrite as: h  Ma.

23
To update authorities, the scores flow in the opposite
direction. So, use Mji instead of Mij, which forms MT.
Update authority ai to be the sum of hj over all nodes j that have an edge to i.

Update rule for authorities: ai  M1i h1 + M2ih2 + … + Mni hn


Rewrite as: a  MTh.

24
PageRank: Use Nji, which forms NT.
Update PageRank score ri to be the sum of rj over all nodes j that have an edge to i. Note that the matrix structure is
similar to the authority update, except in PageRank Nji are fractions representing “portions of flow” from node j to i.

Basic PageRank update rule: ri  N1i r1 + N2ir2 + … + Nni rn

Rewrite as: r  NTr

25
Scale-down (damping) PageRank, for the problem of loops

s=0.8

Define Ñij to be sNij + (1- s)/n

Basic PageRank update rule: ri  Ñ1i r1 + Ñ2ir2 + … + Ñni rn


Rewrite as: r  ÑTr

Repeated application: r<k> = (ÑT)kr<0>


Convergence to PageRank (see EK ch 14 for proof): ÑT r<*> = r<*>
r<*> is PageRank 26
27

You might also like