Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 41

The Long Tail

(Social Media Analytics)


Deepayan Chakrabarti (deepay@utexas.edu)

1 The Long Tail


Logistics
 For the next class, install Gephi (gephi.org)!

2 The Long Tail


Popularity in social networks
 Who’s most popular?
 Justin Bieber [Twitter]
 Kim Kardashian [Instagram]
 …

 What is the pattern of popularity in social networks?


 How many super-popular folks compared to “normal” folks?
 What is “normal”?
 How does popularity change over time?

3 The Long Tail


Degrees and Distributions
 Degree = number of friends

degree = 1

degree = 3

 But what happens on Twitter


 Friendship need not be reciprocated
 The people I follow need not be the ones who follow me

 Facebook is an undirected network (considering only “friendships”)


 Twitter and Instagram are directed networks
4 The Long Tail
Degrees and Distributions
 In directed networks:
 Two different notions of degree

 In-degree = number of people who follow you


 Out-degree = number of people you follow

In-degree = 6
Out-degree = 3

5 The Long Tail


Degrees and Distributions
 Degree is a property of each person
 How can we characterize the entire network?

 Average degree?
 Median degree?

 If I told you:
 The average height of human males is 5’10”
 I pick someone at random
 What is this person’s height?
 A guess of 5’10” is probably not too far from the truth
6 The Long Tail
Degrees and Distributions
 A guess of 5’10” is probably not too far from the truth
 Why?
average
How many males

 Most guys are nearly


average
 Distribution is nearly
symmetric around
average value

Height (in cm)


7 The Long Tail
Degrees and Distributions
 Is the average population of a town always a good guess?
Percentage of cities

Duffield, VA (pop
89)  Most towns have low pop,
but:
 many atypical cities
 distribution is not
New York City
symmetric  “right skew”

City population

8 The Long Tail


Degrees and Distributions
 In a social network…
Percentage of users

Me
 Most users have very few
followers
 but several are extremely
popular
Justin Bieber

Node degree
(e.g., number of Twitter followers)

9 The Long Tail


Outline
 Degrees and distributions
 Power Laws
 Business implications

 Why such power laws?


 The rich get richer model

 Measuring power laws

10 The Long Tail


Power Laws

Plotted on
Percentage of cities

log-log scale

City Population City Population


(log-scale)
11 The Long Tail
Power Laws
 What is a logarithmic axis?

 powers of a number will be uniformly spaced

10 20 30 40 50 100 200

1 2 4 8 16 32 64 128 256

 20=1, 21=2, 22=4, 23=8, 24=16, 25=32, 26=64, …


 The linear number line gets squished
 Distance between 50 & 100 is the same as between 100 & 200

12 The Long Tail


Power Laws

Plotted on
Percentage of users

log-log scale

Popularity (degree) Popularity (degree)


The log-log plot shows hidden structure
13 The Long Tail
Power Laws
City Population Percentage of
(x axis) cities (y axis)

104 1/102
a b
Percentage of users

105 10*a 1/104 b/102


106 100* 1/106
2 orders of
b/104
magnitude
a
1 order of
magnitude

Popularity (degree)
14 The Long Tail
Power Laws
City Population Percentage of
(x axis) cities (y axis)

104 1/102
a b
Percentage of users

slope = 2 105 1/104


10*a b/102
106 100* 1/106
2 orders of
b/104
magnitude
a
y varies as 1/x2
1 order of
magnitude

Popularity (degree)
15 The Long Tail
Power Laws
 The log-log plot shows the tail
better
 Data shows a “line on log-log
scales”
Percentage of users (y)

slope
 If popularity=x, what is the
corresponding % of users y?
 y ~ 1/x slope

Popularity (x) y = c. x-slope


16 The Long Tail
Power Laws

y = c. x-slope

slope  Called a Power Law


y

 The slope is also called


the exponent α
x  The constant c doesn’t
matter much
17 The Long Tail
Power laws
 What does the exponent mean?

Fraction of users
Fraction of users

α=4
α=2

Popularity Popularity

 Smaller exponent  more skew, larger ratios of max/min

18 The Long Tail


Power laws
 What is the typical exponent?

19 The Long Tail


Power Laws
Small values are
most frequent …
Probability of observing
this value

… but extremely high values


can show up too!

Some “value” of interest


(e.g., popularity)

Power Laws in the “statistical sense”


20 The Long Tail
Power Laws
 But in business, power laws are used in a different sense…
 Rank users from highest to lowest popularity
 Plot popularity versus rank

A few users have the bulk of


all followers
Popularity

but most users are in the


“long tail”

Rank Me
Justin Bieber
21 The Long Tail
Power Laws
Statistical sense Business sense

Popularity
Probability

“Head”

“Long tail”

Popularity Rank Me
Me Justin Bieber Justin Bieber

22 The Long Tail


Power laws
 The 80/20 rule
 “20% of the richest Americans hold 80% of the wealth”
 (actually 86% of the wealth)

80% of total wealth


Wealth

20 40 60 80 100

Percentage rank of wealthiest


Americans
23 The Long Tail
Power laws
 “Zipf’s Law”
 How frequent is the 3rd most common word?
8th most common word?
100th most common word?

The top words are extremely


Word frequency

frequent (“the”, “and”, “is”)

but most words are


in the “long tail”

Rank of word

24 The Long Tail


Power laws (summary)
 The upshot
 Looks like a line on a log-log plot

 The ratio of the maximum to the minimum is huge


 NYC is 150,000 times bigger than the smallest city
 “the, of, and” are far more frequent than the most infrequent words

 A “right skew”, also known as the “long tail”


 A few people are extremely popular
 and account for most of the links on Twitter/Instagram

25 The Long Tail


Outline
 Degrees and distributions
 Power Laws
 Business implications

 Why such power laws?


 The rich get richer model

 Measuring power laws

26 The Long Tail


Business implications

Chris Andreson, ‘The Long Tail’, Wired, Issue 12.10 - October 2004
27 The Long Tail
Business implications
 The long tail has always existed
 So why is it important now?

 Items in the tail can now be found


 Recommendation Engines can point consumers towards
obscure content
 They can be produced far more cheaply
 E.g., online music doesn’t require physical handling like CDs
 Technology enables operation at scale
 Amazon can afford to stock rare items

28 The Long Tail


Business implications

Cater to the head… … or to the tail?


 Common purchases  Small purchases, rare items
 Walmart  Amazon

 Popular movies  Wide-ranging selection


 Redbox  Netflix
Blockbuster
 Big advertisers with  Small advertisers
marketing departments  Google AdSense

29
Outline
 Degrees and distributions
 Power Laws
 Business implications

 Why such power laws?


 The rich get richer model

 Measuring power laws

30 The Long Tail


Why power laws
 How can we end up with power law degree distributions?

 Degree distribution of a random graph model


 Concentrated around its mean

How many males


How many nodes

Degree Height (in cm)


31 The Long Tail
The rich get richer
 How can we end up with power law degree
distributions?

 Preferential attachment
 Start with a small village with m0 villagers
 everyone knows everyone
 New people arrive one at a time
 Each forms exactly m friendships with existing villagers
 They prefer to connect to the popular villagers
 preferential attachment
 The rich get richer
32 The Long Tail
The rich get richer
 How can we end up with power law degree
distributions?

 Preferential attachment (or, the rich get richer)


 “prefer to connect to the popular villagers”
 Probability of connecting to villager v is proportional to
current degree of v

degree of v
 Prob(connection to v) =
total degree of all
villagers

33 The Long Tail


The rich get richer
 Initial village with 3 people 3
 Everyone’s degree=2
 Total degree = 6 1 2
 Probability of connection
= 2/6 = 1/3 for everyone
Node: 1 2 3
 New person (#4) arrives Degree: 2 2 2
 4 picks two nodes using Total Degree: 2+2+2 = 6
these probabilities Prob(connection): 1/3 1/3 1/3
 Say, he connects to 2 and 3

34 The Long Tail


The rich get richer
 Initial village with 3 people 3 4
 New person (#4) arrives
 4 connects to 2 and 3 1 2

 2 and 3 are now more popular


Node: 1 2 3 4
Degree: 2 3 3 2
 New person (#5) arrives Total Degree: 2+3+3+2 = 10
 5 picks two friends with Prob(connection): 0.2 0.3 0.3 0.2
these probabilities
 Say, 3 and 4

35 The Long Tail


The rich get richer 5
 Initial village with 3 people
3
 New person (#4) arrives 4
 4 connects to 2 and 3
1 2
 New person (#5) arrives
 5 connects to 3 and 4
Node: 1 2 3 4 5
Degree: 2 3 4 3 2
Total Degree: 2+3+4+3+2 = 14
 3 is most popular now Prob(connection): 0.14 0.21 0.28 0.21 0.14
 twice as popular as 1 & 5

 And the process continues…


36 The Long Tail
The rich get richer
 Once someone becomes popular,
 (by luck? by a good initial product?)
 they keep gaining popularity

 After a long time…


 degree distribution is a power law with α=3

 This isn’t the only way to get to power laws


 But it is pretty intuitive…

37 The Long Tail


The rich get richer
 This happens in business too

 A new product
 (say, a video app for iOS, in the early days of the iPhone)
 gets a few users
 who rate it, and it goes to the top among all video apps
 New users can find it and buy it more easily
 Even more popular
 The cycle continues

38 The Long Tail


Outline
 Degrees and distributions
 Power Laws
 Business implications

 Why such power laws?


 The rich get richer model

 Measuring power laws

39 The Long Tail


Summary
 Popularities are often not close to their averages
 They sport a long tail
 Often power-law distributions
 80/20 law, “Zipf’s Law”, many other names…

 One way to get to power laws


 Preferential attachment
 Someone gets popular initially
 and the popular folks become even more popular over time

 For the next class, install Gephi (gephi.org)!


40 The Long Tail
Acknowledgements
 Lada Adamic, U. Michigan
 Mark Newman: Power Laws, Pareto distributions, and
Zipf’s Law
 Power laws and logarithmic binning:
http://www.esapubs.org/archive/ecol/E089/052/appendix-
A.pdf

41 The Long Tail

You might also like