Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 49

Trust in Social Networks

• People annotate their relationships with


information about how much they trust their
friends
• Trust can be binary (trust or don’t trust) or on
some scale
– This work uses a 1-10 scale where 1 is low trust and 10
is high trust
• At least 8 social networks have some mechanism
for expressing trust explicitly, several dozen have
implicit trust information
Using Trust from Social
Networks
• If we have trust available from a social
network, how can we use that?
• Trust in people can influence how likely we
are to
– Give them access to information
– Accept information from them at all
– Consider the quality of information from them
Examples
• Only people I trust can see my phone
number
• I will only accept emails from people I trust
Challenges to Using Trust
• Each person only knows a very very small
part of the network
• For people we know, some automatic use of
trust may be helpful, but it does not provide
any new information
• If we have access to the network, we need a
way to compute how much we should trust
others
Inferring Trust
The Goal: Select two individuals - the source
(node A) and sink (node C) - and recommend
to the source how much to trust the sink.
tAC

A B C
tAB tBC
Caveats and Insights
• Trust is contextual
• Trust is asymmetric
• Trust is not exactly transitive
Sink
Source
Trust Algorithm
• If the source does not know the sink, the source
asks all of its friends how much to trust the sink,
and computes a trust value by a weighted average
• Neighbors repeat the process if they do not have a
direct rating for the sink
How Well Does It Work?
• Pretty well
• On networks where we have tested it, trust
is computed accurately within about 10%
– Test this by taking a known trust value, deleting
the edge between those people, comparing the
known value with the value we compute
– 10% is very good for social systems with lots
of noise
Applications of Trust
• With direct knowledge or a
recommendation about how much to trust
people, this value can be used as a filter in
many applications
• Since social networks are so prominent on
the web, it is a public, accessible data
source for determining the quality of
annotations and information
0
Ordering
• Use trust to determine the order in which
information is presented

Aggregating

• If data is aggregated, we can use trust to determine


how much weight is given to different sources

1
Social Networks for Science

Data + Provenance + Social Networks = Social


Policies

2
Policies on the Web
• Policies on the web are used to filter and restrict
access to information for
– Security
– Privacy
– Trust
– Information filtering
– Accountability
• Important because of the open nature of the web

3
Applications of the policy aware
web
• Website access
• Network routing
• Storage management
• Grid computing
• Pervasive computing
• Information filtering
• Digital rights management
• Collaboration

4
Applications and Industrial
Interest
• Internet Content Rating Agency
– Using policies and rules to develop content ratings for
websites
• Efforts underway at
– Microsoft, IBM, Sun, BEA, Oracle
• Heavily discussed at W3C Workshop on
Constraints and Capabilities for Web Services
– http://www.w3.org/2004/09/ws-cc-program.html

5
Example Policies
• Only allow members of my research group
to access this data set
• Reject messages from anyone whose
address is not on my list of verified senders

6
Policies and Trust
• Only users whose inferred trust rating is a 9 or 10 may
run processes on this shared computing resource
• Access to preprints of this paper are accessible only to
trusted Fermilab personnel, members of the research
team at other institutions, or the NSF advisory board
• Include information in my knowledge base only if it,
and all the files and processes in its provenance, were
created or executed by people I trust at a level 7 or
above

7
Extending Trust to Science
• In collaborative scientific environments,
some data and resources require strict
access control (username / password)
• For others, this level of control is
unnecessary and cumbersome

8
Trust for Access Control
• With a scientific social network, trust can be used
to restrict access to
– Data
– Computing resources
and
– Limit what data is integrated into a knowledge base
– Weight conflicting information from different sources
according to the trustworthiness of the source

9
Leading to Collaboration
• The semantic web with social networks
provides a platform for
– Publishing data
– Publishing metadata (so experiments can be
verified)
– Limiting/granting access to sensitive data
– Gathering data from other sources
– Filtering data from the web
0
What do we need to do?
• “Easy” Steps
– Building ontologies for representing
scientific data / metadata
– Publishing data on the web

1
What do we need to do?
• Hard Steps (because people don’t want to
do it)
– Developing web policies for limiting
access to non-critical data
• Webmasters can do this, with training and
collaboration with data owners
– Motivating scientists into social networks

2
Forcing the Anti-Social Into
Social Nets
• Can’t expect scientists to use a
Facebook/MySpace style social network
(and we probably don’t want to see that
anyway…)
• Integrate social networking into other
activities
– E.g. email

3
The Payoff
• A whole new way of working over the web
• Multiple levels of collaboration
• New ways of sharing data and working
together

4
Conclusions
• The intersection of the Semantic Web, social
networks, and science holds great promise for
revolutionizing collaboration over the web
• Steps to achieving it are mostly social, not
technological
– Motivating the use of these technologies among
everyone involved with data
– Introducing new ways to collaborate and
encouraging adoption of new techniques

5
Outline
• Location Mining
• Patented Algorithms
– Tweethood
– Tweecalization
– Tweeque

6
Importance of Location Mining
• The advances in location-acquisition and mobile
communication technologies empower people to use location
data with existing online social networks.
• The knowledge of location allows the user to expand his or her
current social network, explore new places to eat, etc.
• Just like time, location is one of the most important
components of user context, and further analysis can reveal
more information about an individual’s interests, behaviors,
and relationships with others.
• Three Uses: Privacy and Security, Trustworthiness, Marketing

7
Privacy and Security
• Location privacy is the ability of an individual to move in public space with the
expectation that under normal circumstances their location will not be
systematically and secretly recorded for later use.
• Many people apart from friends and family are interested in the information users
post on social networks.
– This includes identity thieves, stalkers, debt collectors, con artists, and
corporations wanting to know more about the consumers.
• Once collected, this sensitive information can be left vulnerable to access by the
government and third parties. And unfortunately, the existing laws give more
emphasis to the financial interests of corporations than to the privacy of
consumers.

8
Trustworthiness
• Trustworthiness is another reason which makes location discovery so important.
• It is well-known that social media had a big role to play in the revolutionary wave of demonstrations and
protests occurring in the Arab world termed as the “Arab Spring” to accelerate social protest.
• The Department of State has effectively used social networking sites to gauge the sentiments within
societies.
• Maintaining a social media presence in deployed locations also allows commanders to understand
potential threats and emerging trends within the regions.
• The online community can provide a good indicator of prevailing moods and emerging issues.
• Many of the vocal opposition groups will likely use social media to air grievances publicly.
• In such cases and others similar to these, it becomes very important for organizations (like the US State
Department) to be able to verify the correct location of the users posting these messages.

9
Marketing
• Impact of social media in marketing and garnering feedback from consumers. First social
media facilitates marketers to communicate with peers and customers (both current and
future).
• It provides significantly more visibility for the company or the product and helps you to
spread your message in a relaxed and conversational way.
• The second major contribution of social media towards business is for getting feedback from
users.
• Social media gives you the ability to get the kind of quick feedback inbound marketers
require to stay agile.
• Large corporations from Wal-Mart to Starbucks are leveraging social networks beyond your
typical posts and updates to get feedback on the quality of their products and services,
especially ones that have been recently launched on Twitter.

0
Tweethood
• Tweethood is an algorithm for Agglomerative Clustering on Fuzzy k-Closest
Friends with Variable Depth. Graph-related approaches are the methods that rely
on the social graph of the user while deciding on the location of the user. In this
chapter, we describe three such methods that show the evolution of the algorithm
currently used in Tweethood.

• Each node in the graph represents a user and an edge represents friendship. The
root represents the user U whose location is to be determined, and the F 1, F2,…, Fn
represents the n friends of the user. Each friend can have his or her own network,
like F2 has a network comprising of m friends F21, F22,…., F2m.

1
Naïve Approach
• A naïve approach for solving the location identification problem
would be to take simple majority on the locations of friends
(followers and following) and assign it as the label of the user.
• Since a majority of friends will not contain a location explicitly,
we can go further into exploring the social network of the friend
(friend of a friend).
• For example, if the location of Friend F 2 is not known, instead of
labeling it as null, we can go one step further and use F 2’s friends
in choosing the label for it. It is important to note here that each
node in the graph will have just one label (single location) here.

2
K- Closest Friends with Variable Depth
• As Twitter has a high majority of users with public profiles, a user has little
control over the people following him or her. In such cases, considering
spammers, marketing agencies, etc., while deciding on the user’s location can lead
to inaccurate results. Additionally, it is necessary to distinguish the influence of
each friend while deciding the final location. We further modify this approach and
just consider the k closest friends of the user.
• Closeness among two people is a subjective term and we can implement it in
several ways including number of common friends, semantic relatedness between
the activities (verbs) of the two users collected from the messages posted by each
one of them, etc. Based on the experiments we conducted, we adopted the number
of common friends as the optimum choice because of the low time complexity and
better accuracy.

3
Fuzzy_k_Closest_Friends
• The idea behind the Fuzzy k closest friends with variable depth is the fact that
each node of the social graph is assigned multiple locations of which each is
associated with a certain probability. And these labels get propagated throughout
the social network; no locations are discarded whatsoever. At each level of depth
of the graph, the results are aggregated and boosted similar to the previous
approaches so as to maintain a single vector of locations with their probabilities.

4
Tweecalization
• Graph-related approaches are the methods that rely on the social graph of the user
while deciding on the location of the user. As observed earlier, the location data of
users on social networks is a rather scarce resource and only available to a small
portion of the users.
• This creates a need for a methodology that makes use of both labeled and
unlabeled data for training. In this case, the location concept serves the purpose of
class label.
• Therefore, our problem is a classic example for the application of semi-supervised
learning algorithms. In this chapter, we propose a semi-supervised learning
method for label propagation

5
Label Propagation

• The labeled propagation algorithm is based on transductive learning.


• In this environment, the dataset is divided into two sets.
• One is the training set, consisting of the labeled data.
• On the basis of this labeled data, we try to predict the class for the second set,
called the test or validation data consisting of unlabeled data.

6
Trustworthiness and Similarity Measure
• The single most important thing is the way we define similarity (or distance) between two data points or,
in this case, users.
• We introduce the notion of trustworthiness for two specific reasons. First, we want to differentiate
between various friends when propagating the labels to the central user and second, to implicitly take into
account the social phenomenon of migration and thus provide for a simple yet intelligent way of defining
similarity between users.
• Trustworthiness (TW) is defined as the fraction of friends which have the same label as the user himself.
So, if a user, John Smith, mentions his location to be Dallas, Texas and 15 out of his 20 friends are from
Dallas, we say that the trustworthiness of John is 15/20=0.75.
• It is worthwhile to note here that users who have lived all their lives at a single city will have a large
percentage of their friends from the same city and hence will have a high trustworthiness value. On the
other hand, someone who has lived in several places will have a social graph consisting of people from all
over and hence such a user should have little say when propagating labels to users with unknown
locations. For users without a location, TW is zero.

7
Trustworthiness and Similarity Measure

• Friendship similarity among two people is a subjective term and we can


implement it in several ways including number of common friends, semantic
relatedness between the activities (verbs) of the two users collected from the
messages posted by each one of them
• Based on the experiments we conducted, we adopted the number of common
friends as the optimum choice because of the low time complexity and better
accuracy.

8
Tweeque

• People migrate from city to city, state to state and country to country all the time.
• Therefore our algorithms may be impacted by such migration. That is, how does
one extract the location of a person when he or his friends may be continually
migrating?
• Towards this end we have proposed a set of algorithms that we call Tweeque.
• That is, Tweeque takes into account the migration effect. In particular, it
identifies social cliques for location mining.

9
Directions
• Different Algorithms for Location Mining
• Other Demographics: Age, Gender, etc.
• Develop systems with real-world applications

0
Attacks on Social Media
• There are three types of attacks
• One is to attack the social media
• The other is to attack the computer systems, networks and infrastructures through
social media. T
• The third group consists of attacks specially formulated for social media systems.

1
Attacks on Social Media
• De-Anonymization Attacks: In this attack, hackers can exploit the group
membership information about the members of the networks and subsequently
identify the members.
• “Group information is available on social networking sites”.
• Specifically they used the web browser attacks to obtain the group membership
information.
• When a member of a group and the social network visit a malicious website, the
website will carry out the attack the de-anonymization attack formulated by the
hacker.
• Source: on “A Practical Attack to De-Anonymize Social Network Users”,
Wondracek et al

2
Attacks on Social Media
• Sourse: Seven Deadly Attacks; Timm and Perez
• Seven attacks that could occur including malware attacks, phishing attacks, and
identity theft.
• For example, for malware attacks they state that there are two ways the malware
can compromise the network.
• One is a virus that will infect the system and the other is a malware such as a
Trojan horse that could conceal information.
• They also explain the cross site scripting (XSS) attack where the malware will
enable the user’s browser to execute the attacker’s code and cause a compromise
to the network.

3
Attacks on Social Media
• COMBOFIX List of Attacks: The COMBOFIX website lists several attacks to
social
• The Bad SEO attack attracts the user to a website that contains the malware. The
users are also lured to fake websites.
• The Pornspace malware is a worm that utilized a flaw in the security mailing list
of MySpace and stole the profiles of the users and then sent porn-based spam.
• In the Over the Rainbow malware attack the hacker’s embedded JavaScript code
into Twitter messages that can retweet.
• The user as well as the members of his/her network could be directed to porn
sites.
• In the Dislike Scam on Facebook attack which affected Facebook, the users were
given bogus surveys and once they filled the surveys that were attacked by a
malware.

4
Attacks on Social Media
• Top Ten Attacks in Social Media: At the RSA conference in 2014, Gary Bahadur,
the CEO of KRAA Security describes various attacks to Facebook, Twitter,
LinkedIn as well as some other social media attacks.
– For example, he explains how an Android malware attack spread through
Facebook.
– This attack shows that the gadgets we use to connect to a social network site
can cause a serious attack to the site.
• Top Nine Social Media Threats of 2015: The Zerofox website published the top
nine social media threats including executive impersonations, corporate
impersonations, account takeover, customer scams and phishing attacks.
– An account takeover attack in 2015 was especially sinister as it affected the
United States Central Command (CENTCOM).

5
Attacks on Social Media
• Financial Times Report: On July 30, 2015, the Financial Times reported that
hackers are using Twitter to conceal intrusions.
– For example, the hackers used Twitter images to conceal malware and from
there attacked the computers they wanted to compromise.
– This attack appears to be similar to a stenographic attack where suspicious
messages are embedded into a media such as images and video.
• Link Privacy Attacks: In their article on link privacy Effendy et al discuss a
version of the link privacy attack.
– It is essentially bribing or compromising some of the members (usually a
small number) in a social networks and using this obtain the link details (that
is, who their friends are) of those members who are non-compromised.

6
• ASONAM 2009
• Joseph Bonneau, Jonathan Anderson, George Danezis:
Prying Data out of a Social Network. 249-254

• ASONAM 2010
• M. Saravanan, Garigipati Prasad, Karishma Surana, D. Suganthi:
Labeling Communities Using Structural Properties. 217-224

• Uffe Kock Wiil, Jolanta Gniadek, Nasrullah Memon:


Measuring Link Importance in Terrorist Networks. 225-232

7
8
Papers to Read for Exam #2

You might also like