Towards Segregation Aware Social

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 27

Towards Segregation Aware Social

Recommendations
INTRODUCTION
•A community in a social network refers to a group of people who are
more tightly interconnected than the overall network.

• Users in a community tend to interact more frequently with each other


and share common interests.

• Detecting such tight communities is one of the essential tools in social


network analysis.

•Manual addition of circles is laborious and also requires constant


updates as and when new connections are made
Whether Similarity based Friendship
recommendation increase or Decrease
Social Segregation?
Segregation is the social division of human beings based on any
number of factors, including race or nationality. For example :BJP
and Congress communities

Less segregation is good for society for long term.

So, our task is to see how social recommendation system


Segregates different types of users while friendship
recommendation.
Literature Study
•In SNAP (Stanford Network Analysis Project), concept of viewing users as
individual “Egos” was introduced.
•They formulated the problem of circle detection as a clustering problem on her
ego-network, the network of friendships between her friends.
•Current project is constructed based on personal friendship network which
draws ideas from the ego-network.
•In this paper we studied the problem of automatically discovering users’ social
circles.
Related work
•Barbieri et al. (2014) gave an method to predict new connections between
people with a stochastic topic model.The method also represents whether
a link is “topical” or “social” and produces an explanation of the type of
recommendation produced.

•Gupta et al. (2013) presented Twitter’s user recommendation service,


which is based on shared interests, common connections, and other
related factors.

•Liben-Nowell and Kleinberg (2003) studied the user recommendation


problem as a link prediction problem. They develop several approaches,
based on metrics that analyze the proximity of the nodes in a social
network, to infer the probability of new connections among users.
Dataset Description

•We have used the facebook dataset provided by the Stanford


University which consist of networks with predefined communities.

•This dataset consists of 'circles' (or 'friends lists') from Facebook. The
data has been anonymized by replacing the Facebook-internal ids for
each user with a new value.

•It is possible to determine whether two users have the same political
affiliations, but not what their individual political affiliations represent.
Dataset Description
first name Alan position Cryptanalyst

last name
Fig 2. User profile tree representation
Turing company GC&CS
name Cambridge
work
type College

education name Princeton


type Graduate School
first name Dilly position Cryptanalyst
last name Knox company
GC&CS position Cryptanalyst
work
company Royal Navy

education name Cambridge

type College

Profile information in all of our datasets can be represented as a tree where each level encodes increasingly
specific information From Facebook we collect data from 26 categories, including hometowns, birthdays,
colleagues, political affiliations, etc.
Main objectives

•To design and implement friendship recommendation simulation


using facebook dataset for friends recommendation

•To run our simulation with two different friendship recommendation


algorithms

•The goal here is to see how the different friendship recommendation


algorithms causes social segregation in the Social network.

•The idea here is to find an intervention mechanism for the


downstream applications so that they do not further reinforce societal
segregation.
Implementation of friendship recommendations simulation
Creating Network :
Form the network as given in the dataset. Let’s represent the network as G = (V, E), where
V is the set of nodes and E is the set of edges. For all e ∈ E,e = (vi, vj ) where vi, vj ∈ V.

Fig 3. Representing graph a social network

We have also given users political labels in our dataset namely 0 or 1. So, we have
included all the users having political label 0 in group 0 and all the users having political
label 1 in group 1 and there is also some user who does not belongs to either groups.
User Arrival Process:

Some users in social networks are more active than others.


Thus we model it using user-specific activity rate.
Let ri represent the activity rate of vi. At the start of the simulation, we sample this
activity rate for all the users from the below normal distribution clipped to [0, 1].

ri ∼ Normal(mean = 0.5, std = 0.2)

The unit of ri is the number of activities per unit time. We then create a Poisson
point process for each of the users using their corresponding activity rates.

The Poisson process can be used to model the number of occurrences


of events, such as user arrival activity , during a certain period of time,
Modeling inter-arrival times and arrival times in a Poisson process:
The number of occurrence of events(user activity rate) is modeled using a discrete
Poisson distribution,then the interval of time between consecutive events can be
modelled using the Exponential Distribution which is a continuous distribution.

Simulating inter-arrival times in a Poisson process:


We do this by using the Inverse CDF technique, in which we literally construct the
inverse function of the CDF, and feed it different probability values from a Uniform(0,1)
distribution. This gives us the corresponding inter-arrival times for the respective
probabilities.The inverse function of the CDF of the inter-arrival times is

We feed into this function, probability values from the continuous uniform
distribution Uniform(0,1)
following is the table of patient inter-arrival times in hours at the ER
for the first 10 patients. We have generated this date using the
above formula, with λ set to 5 patients per hour.

Simulating a Poisson process:


1. For the given average incidence rate λ, use the inverse-CDF technique to generate inter-arrival times.
2. Generate actual arrival times by constructing a running-sum of the interval arrival times.
User’s Affinity Score:

•Social network platforms use network structure, user


histories,likes/dislikes, etc. to find a type of affinity score between all
the pairs of users who do not have a direct link, and then use those
affinity scores for friend suggestions.

•Here we only use network structure to find the affinity score as


defined below.
4 User Dynamics:

•We model above mentioned friendship dynamics using user-


specific send probabilities and user-specific accept-probabilities in
our simulation.

•Based on the send and accept-probability,the user then chooses


whether to send friend requests to those who are listed in then
recommendation, and also chooses whether to accept the
received friend requests.

The user sends request to the jth ranked friend recommendation Ri[j] based on the
following Bernoulli sampling.

The user then chooses to accept or reject friend requests if any. If vj had earlier sent a
friend request, vi accepts or rejects it based on the following Bernoulli sampling.
The following is experiment screenshot of our friendship recommendation simulation for user 3195
Recommendation based on Network structure(Algorithm 1)

One of the important factors while recommending new friends to a user is the number of mutual or
common friends between them and all those nodes which is having less distance from other nodes .

We take graph of friends having their group label as input and output recommendation list of new
friends based on their distance.

The Algorithm can be easily underslood by using our practice graph. There are five nodes in the
graph namely A,B,C,D,E.F.
Where did the algorithm fails?

𝑢3 𝑢13
𝑢10
𝑢11 𝑢14
𝑢4
𝑢12 𝑢2 𝑢5 𝑢15

𝑢1

𝑢9
𝑢6
𝑢16
𝑢19
𝑢7
𝑢8 𝑢17
𝑢20 𝑢18

Users network with their group label.

Group 0 users :u2,u3,u10,u11,u12 Group 1 users :u8,u6,u7,u16,u17,u18


Recommendations based on Reservation
(Algorithm 2)
•Generate Graph from given dataset and stores groups of each nodes, nodes
with political label 0 belongs group 0 and nodes with political label 1
belongs group 1.
•Simulates the order in which the users come in the network using poisson
point process.
•For a user, it calculates affinity scores to all the other nodes,the score
calculation based on number of minimum paths of length 2 and length 3
and then do score normalisation..
•It takes top 10 recommendations based on their affinity score in which top
8(80%) are normal and bottom 2 (20%) are reserve for different group user.
•If group 0 user comes in network then 20% of their recommendations list
is reserverd for group 1 users and vice-versa .
•Then the algorithm follows the user dynamic process.

Here we are doing 20% reservation for different group user, in similar we
can also increase it to 30% reservation
Evaluation Metric.
This metric measures segregation between groups in
network.

Formula for metric calculation= A / B

Where A is intergroup average distance,


B is average of intra groups average distance.

Global cluster coefficient:

Clustering coefficient is a measure of the degree to which nodes in a


graph tend to cluster together.we used this to calculate how different
types of algorithms are making clusters in a specific group.
EXPERIMENT

We tested the two recommendation systems in the following way:


 Create network along with users specific groups. i.e group 0 and group 1.
 Randomly generates 50000 user logins through poisson point process.
 Whenever a user accepts some friends request then we update the graph and
the next friendship recommendation is based on updated graph.
 After every 1000 user logins into network. We do the following:
 Calculates Group specific average cluster coefficient of the group 0 subnetwork
and stores it in list.
 Calculates Group specific average cluster coefficient of the group 1 subnetwork
and stores it in list.
 Calculates Evaluation metric by computing intra groups average distance divided
by average of group specific inter groups average distance and stores it in list.
 Calculates general average cluster coefficient of the network and stores it in list.
RESULT
Recommendation based on Network structure (Algorithm 1)

•Here we can clearly see that groups segregation is increasing with the numbers of users
login.

•Now with this we can clearly say that similarity based friendship recommendation increases
social segregation.
•We have run avg cluster coefficient for group specific subnetwork and for the entrie
network.
•Here we can clearly see that there is increase of cluster coefficient of the nodes with the
increasing numbers of user login.

•Here users are clustering together more tightly because of the reason that our algorithm is
only recommending user who is close to the target user.
Recommendations based on Reservation (Algorithm 2)

•Here we can clearly see that groups segregation is decreasing with the numbers of users login
significantly as compared to algorithm 1.
•Here we can also see that with the increase in reservation percentage ,more new connection are
being made between users from different groups like from figure we can see that 30% reservation are
making our network less segregation as compared to 20% reservation.
•Here we can clearly see that algorithm 2 decreases of avg. cluster coefficient of the nodes
with the increasing numbers of user login as compared to algorithm 1 and with this we can
say that nodes are not getting clustered more tightly as compared to algorithm

•Here we can also see that with the increase in reservation percentage ,more new
connection are being made between users from different groups and with that more open
triangles are being made and it results less clustering of nodes.
Conclusion and Future works
In summary, we performed graph analysis and studied the various
properties of the social network. After properly understanding the
entire network, we run friendship recommendation simulation and
see how it recommending friends for a specific user on the
network with two different type of algorithms. Apart from this, we
analysis our network user dynamic by applying evaluation metric
and avg cluster coefficient at different login interval on the
network.In brief,what we found is algorithms which suggestion is
based on closest or nearest distance generally forms clusters. The
current dataset does not have specific features listed for each user.
If they were provided, we can use them to detect users with
similar features and recommend friends.

References
[1] Representing degree distributions, clustering, in social networks with latent cluster
random effects models. Social Networks, 2009 P. Krivitsky, M. Handcock.
[2] Modularity and community structure in social networks. PNAS, 2006 M. Newman.
[3] Uncovering the overlapping community structure of complex networks in nature and society.
Nature, 2005 I. Farkas, and T. Vicsek.
[4] Discovering social circles in ego networks. J. McAuley and J. Leskovec. 2012.
[5] The anatomy of the Facebook social graph. J. Ugander, B. Karre preprint, 2011.
[6] Towards discovering hidden communities based on user profiles. In ICDM , 2010 T.
Yoshida
THANK YOU

You might also like