Data and Applications Security

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 50

UT DALLAS

Erik Jonsson School of Engineering & Computer Science


FEARLESS engineering
Data and Applications Security

Security and Privacy in
Online Social Networks
Murat Kantarcioglu
Bhavani Thuraisingham

Thanks to Raymond Heatherly and Barbara Carminati for
helping in slide preparations

April 2012
FEARLESS engineering
Outline
Introduction to Social Networks
Properties of Social Networks
Social Network Analysis Basics
Data Privacy Basics
Privacy and Social Networks
Access control issues for Online Social Networks

FEARLESS engineering
Social Networks
Social networks have important implications for our
daily lives.
Spread of Information
Spread of Disease
Economics
Marketing

Social network analysis could be used for many activities
related to information and security informatics.
Terrorist network analysis


FEARLESS engineering
Enron Social Graph*
* http://jheer.org/enron/
FEARLESS engineering
Romantic Relations at Jefferson High School
FEARLESS engineering
Emergence of Online Social Networks
Online Social networks
become increasingly popular.
Example: Facebook*
Facebook has more than 200
million active users.
More than 100 million users
log on to Facebook at least
once each day
More than two-thirds of
Facebook users are outside of
college
The fastest growing
demographic is those 35 years
old and older
*http://www.facebook.com/press/info.php?statistics
FEARLESS engineering
Properties of Social Networks
Small-world phenomenon
Milgram asked participants to pass a letter to one of their
close contacts in order to get it to an assigned individual
Most of the letters are lost (~75% of the letters)
The letters who reached their destination have passed
through only about six people.
Origins of six degree
Mean geodesic distance l of graphs grows logarithmically or
even slower with the network size. (d
ij
is the shortest distance
between node i and j) .


j i
ij
d
n n
l
) 1 (
2
FEARLESS engineering
Small-World Example: Six Degrees of
Kevin Bacon
FEARLESS engineering
Properties of Social Networks
Degree Distribution
Clustering
Other important properties
Community Structure
Assortativity
Clustering Patterns
Homomiphly
.
Many of these properties could be used for
analyzing social networks.

FEARLESS engineering
Social Network Mining
Social network data is represented a graph
Individuals are represented as nodes
Nodes may have attributes to represent personal traits
Relationships are represented as edges
Edges may have attributes to represent relationship
types
Edges may be directed
Common Social Network Mining tasks
Node classification
Link Prediction


FEARLESS engineering
Data Privacy Basics
How to share data without violating privacy?
Meaning of privacy?
Identity disclosure
Sensitive Attribute disclosure
Current techniques for structured data
K-anonymity
L-diversity
Secure multi-party computation
Problem: Publishing private data while, at the same
time, protecting individual privacy
Challenges:
How to quantify privacy protection?
How to maximize the usefulness of published data?
How to minimize the risk of disclosure?


FEARLESS engineering
Sanitization and Anonymization
Automated de-identification of private data with certain privacy
guarantees
Opposed to formal determination by statisticians requirement of
HIPAA
Two major research directions
1. Perturbation (e.g. random noise addition)
2. Anonymization (e.g. k-anonymization)
Removing unique identifiers is not sufficient
Quasi-identifier (QI)
Maximal set of attributes that could help identify individuals
Assumed to be publicly available (e.g., voter registration lists)
As a process
1. Remove all unique identifiers
2. Identify QI-attributes, model adversarys background knowledge
3. Enforce some privacy definition (e.g. k-anonymity)

FEARLESS engineering
Re-identifying anonymous data (Sweeney 01)
37 US states mandate
collection of information
She purchased the voter
registration list for
Cambridge Massachusetts
54,805 people
69% unique on postal code
and birth date
87% US-wide with all three
Solution: k-anonymity
Any combination of values
appears at least k times
Developed systems that
guarantee k-anonymity
Minimize distortion of results
FEARLESS engineering
k-Anonymity
Each released record should be indistinguishable from at least (k-1)
others on its QI attributes
Alternatively: cardinality of any query result on released data should be
at least k
k-anonymity is (the first) one of many privacy definitions in this line of
work
l-diversity, t-closeness, m-invariance, delta-presence...
Complementary Release Attack
Different releases can be linked together to compromise k-
anonymity.
Solution:
Consider all of the released tables before release the new one,
and try to avoid linking.
Other data holders may release some data that can be used in
this kind of attack. Generally, this kind of attack is hard to be
prohibited completely.



FEARLESS engineering
L-diversity principles
L-diversity principle: A q-block is l-diverse if
contains at least l well represented values
for the sensitive attribute S. A table is l-
diverse if every q-block is l-diverse
l-diversity may be difficult and unnecessary to achieve.

A single sensitive attribute
Two values: HIV positive (1%) and HIV negative (99%)
Very different degrees of sensitivity
l-diversity is unnecessary to achieve
2-diversity is unnecessary for an equivalence class that contains
only negative records
l-diversity is difficult to achieve
Suppose there are 10000 records in total
To have distinct 2-diversity, there can be at most 10000*1%=100
equivalence classes

FEARLESS engineering
Privacy Preserving Distributed Data Mining
Goal of data mining is summary results
Association rules
Classifiers
Clusters
The results alone need not violate privacy
Contain no individually identifiable values
Reflect overall results, not individual organizations
The problem is computing the results without access to
the data!
Data needed for data mining maybe distributed among parties
Credit card fraud data
Inability to share data due to privacy reasons
HIPPAA
Even partial results may need to be kept private

FEARLESS engineering
Secure Multi-Party Computation (SMC)
The goal is computing a function
without revealing x
i

Semi-Honest Model
Parties follow the protocol
Malicious Model
Parties may or may not follow the protocol
We cannot do better then the existence of the
third trusted party situation
Generic SMC is too inefficient for PPDDM
Enhancements being explored

) , , , (
2 1 n
x x x f
FEARLESS engineering
Graph Model
Graph represented by a set of homogenous
vertices and a set of homogenous edges
Each node also has a set of Details, one of
which is considered private.

Lindamood et al. 09 &
Heatherly et al. 09
FEARLESS engineering
Nave Bayes Classification
Classification based only on specified
attributes in the node

Lindamood et al. 09 &
Heatherly et al. 09
FEARLESS engineering
Nave Bayes with Links
Rather than calculate the probability from
person n
x
to n
y
we calculate the probability of
a link from n
x
to a person with n
y
s traits

Lindamood et al. 09 &
Heatherly et al. 09
FEARLESS engineering
Link Weights
Links also have associated weights
Represents how close a friendship is
suspected to be using the following formula:

Lindamood et al. 09 &
Heatherly et al. 09
FEARLESS engineering
Collective Inference
Collection of techniques that use node
attributes and the link structure to refine
classifications.
Uses local classifiers to establish a set of
priors for each node
Uses traditional relational classifiers as the
iterative step in classification

Lindamood et al. 09 &
Heatherly et al. 09
FEARLESS engineering
Relational Classifiers
Class Distribution Relational Neighbor
Weighted-Vote Relational Neighbor
Network-only Bayes Classifier
Network-only Link-based Classification

Lindamood et al. 09 &
Heatherly et al. 09
FEARLESS engineering
Experimental Data
167,000 profiles from the Facebook online
social network
Restricted to public profiles in the Dallas/Fort
Worth network
Over 3 million links

Lindamood et al. 09 &
Heatherly et al. 09
FEARLESS engineering
General Data Properties
Diameter of the largest component 16
Number of nodes 167,390
Number of friendship links 3,342,009
Total number of listed traits 4,493,436
Total number of unique traits 110,407
Number of components 18
Probability Liberal .45
Probability Conservative .55
Lindamood et al. 09 &
Heatherly et al. 09
FEARLESS engineering
Inference Methods
Details only: Uses Nave Bayes classifier to
predict attribute
Links Only: Uses only the link structure to
predict attribute
Average: Classifies based on an average of
the probabilities computed by Details and
Links

Lindamood et al. 09 &
Heatherly et al. 09
FEARLESS engineering
Predicting Private Details
Attempt to predict the value of the political
affiliation attribute
Three Inference Methods used as the local
classifier
Relaxation labeling used as the Collective
Inference method


Lindamood et al. 09 &
Heatherly et al. 09
FEARLESS engineering
Removing Details
Ensures that no false information is added to
the network, all details in the released graph
were entered by the user
Details that have the highest global
probability of indicating political affiliation
removed from the network

Lindamood et al. 09 &
Heatherly et al. 09
FEARLESS engineering
Removing Links
Ensures that the link structure of the released
graph is a subset of the original graph
Removes links from each node that are the
most like the current node

Lindamood et al. 09 &
Heatherly et al. 09
FEARLESS engineering
Most Liberal Traits
Trait Name Trait Value Weight Liberal
Group legalize same sex
marriage
46.16066789
Group every time i find out a
cute boy is conservative
a little part of me dies
39.68599463
Group equal rights for gays 33.83786875
Group the democratic party 32.12011605
Group not a bush fan 31.95260895
Group people who cannot
understand people who
voted for bush
30.80812425
Group government religion
disaster
29.98977927
Group buck fush 27.05782866
Lindamood et al. 09 &
Heatherly et al. 09
FEARLESS engineering
Most Conservative Traits
Trait Name Trait Value Weight Conservative
Group george w bush is my
homeboy
45.88831329
Group college republicans 40.51122488
Group texas conservatives 32.23171423
Group bears for bush 30.86484689
Group kerry is a fairy 28.50250433
Group aggie republicans 27.64720818
Group keep facebook clean 23.653477
Group i voted for bush 23.43173116
Group protect marriage one
man one woman
21.60830487
Lindamood et al. 09 &
Heatherly et al. 09
FEARLESS engineering
Most Liberal Traits per Trait Name
Trait Name Trait Value Weight Liberal
activities amnesty international 4.659100601
Employer hot topic 2.753844959
favorite tv shows queer as folk 9.762900035
grad school computer science 1.698146579
hometown mumbai 3.566007713
Relationship Status in an open relationship 1.617950632
religious views agnostic 3.15756412
looking for whatever i can get 1.703651985
Lindamood et al. 09 &
Heatherly et al. 09
FEARLESS engineering
Experiments
Conducted on 35,000 nodes which recorded
political affiliation
Tests removing 0 details and 0 links, 10
details and 0 links, 0 details and 10 links, and
10 details and 10 links
Varied Training Set size from 10% of
available nodes to 90%

Lindamood et al. 09 &
Heatherly et al. 09
FEARLESS engineering
Local Classifier Results
Lindamood et al. 09 &
Heatherly et al. 09
FEARLESS engineering
Collective Inference Results
Lindamood et al. 09 &
Heatherly et al. 09
FEARLESS engineering
Online Social Networks Access Control
Issues
Current access control systems for online
social networks are either too restrictive or
too loose
selected friends
Bebo, Facebook, and Multiply.
neighbors (i.e., the set of users having musical preferences
and tastes similar to mine)
Last.fm
friends of friends
(Facebook, Friendster, Orkut);
contacts of my contacts (2nd degree contacts), 3rd
and4th degree contacts
Xing


FEARLESS engineering
Challenges
I want only my
family and close
friends to see this
picture.
FEARLESS engineering
Requirements
Many different online social networks with different
terminology
Facebook vs Linkedin
We need to have flexible models that can represent
Users profiles
Relationships among users
(e.g. Bob is Alices close friend)
Resources
(e.g., online photo albums)
Relationships among users and resources
(e.g., Bob is the owner of the photo album and Alice is tagged in
this photo),
Actions (e.g., post a message on someones wall).


FEARLESS engineering
Overview of the Solution
We use semantic web technologies (e.g.,
OWL) to represent social network knowledge
base.

We use semantic web rule language (SWRL)
to represent various security, admin and filter
policies.
FEARLESS engineering
Modeling User Profiles and Resources
Existing ontologies such as FoAF could be
extended to capture user profiles.




Relationship among resources could be
captured by using OWL concepts
PhotoAlbum rdfs:subClassOf Resource
PhotoAlbum consistsOf Photos






FEARLESS engineering
Modeling Relationships Among Users
We model relationships among users by defining N-ary
relationship
:Christine
a :Person ;
:has_friend _:Friendship_Relation_1.
:_Friendship_relation_1
a :Friendship_Relation ;
:Friendship_trust :HIGH;
:Friendship_value :Mike .
Owl reasoners cannot be used to infer some relationships
such as Christine is a third degree friend of John.
Such computations needs to be done separately and represented
by using new class.



FEARLESS engineering
Specifying Policies Using OSN Knowledge
Base
Most of the OSN information
could be captured using OWL to
represent rich set of concepts
This makes it possible to specify
very flexible access control
policies
Photos could be accessed by
friends only automatically
implies closeFriend can access
the photos too.
Policies could be defined
based on user-resource
relationships easily.

FEARLESS engineering
Security Policies for OSNs
Access control policies
Filtering policies
Could be specified by user
Could be specified by authorized user
Admin policies
Security admin specifies who is authorized specify
filtering and access control policies
Exp: if U1 isParentOf U2 and U2 is a child than
U1 can specify filtering policies for U2.

FEARLESS engineering
Security Policy Specification (using
semantic web technologies)
Semantic Web Rule Language (SWRL) is used for
specifying access control, filtering and authorization
policies.
SWRL is based on OWL:
all rules are expressed in terms of OWL concepts
(classes, properties, individuals, literals).
Using SWRL, subject, object and actions are
specified
Rules can have different authorization that states the
subjects rights on target object.






FEARLESS engineering
Knowledge based for Authorizations and
Prohibitions
Authorizations/Prohibitions needs to be specified
using OWL
Different object property for each actions
supported by OSN.
Authorizations/prohibitions could automatically
propagate based on action hierarchies
Assume post is a subproperty of write
If a user is given post permission than user
will have write permission as well
Admin Prohibitions need to be specified slightly
different. (Supervisor, Target, Object, Privilige)


FEARLESS engineering
Security Rule Examples
SWRL rule specification does depend on the
authorization and OSN knowledge bases.
It is not possible to specify generic rules
Examples:


FEARLESS engineering
Security Rule Enforcement
A reference monitor evaluates the requests.
Admin request for access control could be
evaluated by rule rewriting
Example: Assume Bob submits the following
admin request


Rewrite as the following rule



FEARLESS engineering
Security Rule Enforcement
Admin requests for Prohibitions could be rewritten as
well.
Example: Bob issues the following prohibition request


Rewritten version



Access control requests needs to consider both filter and
access control policies


FEARLESS engineering
Framework Architecture
Social Network
Application
Reference
Monitor
Semantic
Web
Reasoning
Engine
Access request
Access
Decision
Policy Store
Modified Access
request
Policy
Retrieval
Reasoning Result

SN Knowledge
Base
Knowledge Base
Queries
FEARLESS engineering
Conclusions
Various attacks exist to
Identify nodes in anonymized data
Infer private details
Recent attempts to increase social network access control to
limit some of the attacks
Balancing privacy, security and usability on online social
networks will be an important challenge
Directions
Scalability
We are currently implementing such system to test its scalability.
Usability
Create techniques to automatically learn rules
Create simple user interfaces so that users can easily specify these
rules.

You might also like