Professional Documents
Culture Documents
Clustering Based Anonymization For Privacy Preservation PDF
Clustering Based Anonymization For Privacy Preservation PDF
Clustering Based Anonymization For Privacy Preservation PDF
Abstract— while registering on social networking site, it is suppression techniques are employed. In the generalization
necessary to give the personal information; some of this Quasi Identifier attributes value can be more generalized and
information is sensitive and needed to be preserved. To sustain in suppression Quasi Identifier attributes value replace the
the privacy of user on a social network Anonymization technique original value with specific value. By implementing
is employed. In Anonymization approach individuals personal generalization and suppression on to the dataset it prevents the
information is either mask or remove from the dataset so
danger of unwanted disclosure data maintain the
individual’s data become anonymous. When a dataset is released
it is important to prevent data from unwanted disclosure, balance confidentiality of the dataset and preserves the secrecy of data
the usefulness and privacy of published dataset. Proposed work set while publishing dataset used for research areas and
gives the Anonymized view of a data set and the result of business areas. Anonymization is used to increase the privacy
implementation of the single pass k-means Anonymization of social network users.
algorithm. To Anonymized the dataset generalization and
suppression approaches are used. This work proposes Anonymization techniques,
based on clustering. Clustering involves modifying K-Means
Clustering for clustering of the dataset. Generalization and
Keywords— Anonymization; Generalization; K-Means; Social
Network; Suppression
suppression are the techniques used to Anonymized the
dataset and avoid the direct access to the dataset, which also
protects the Quasi Identifier attributes from unwanted
I. INTRODUCTION disclosure and preserve the privacy. The structure of the paper
Social networks have tremendously gained popularity contains the following sections. Literature survey is given
over the internet. On social networks individuals are sharing regarding Anonymization given in section II. Overview of
data about their professional business and private lives. These Clustering based Anonymization is given for privacy
networks store the personal information about the individuals; preservation of dataset in Section III, we give the
thus social networks become a vast pool of data. This Implementation of single Pass K-Means Anonymization
information holds user’s public data as well as private data. algorithm in Section IV, and Conclude the paper in section V
On social network public information is publicly shared by with the Future Work.
individuals and by default seen by everyone and private
information contains personal data about them which II. LITERATURE SURVEY
individual wants to be protected. Now a day’s social network Now a day’s keeping the privacy while revealing
data is released, this data used by research people, business individuals information over social network is essential. A
people for professional and analysis purpose. Directly brief survey on Anonymization technique [1] Zhou, Pei, Luk
releasing such information on the internet may damage the gives reviews of the existing methods used for
privacy of social network user. On social network may harm Anonymization for sustaining the privacy of revealing data on
the user’s privacy and direct disclosure of individual’s social network, Recognize the challenges in the maintaining
information. Thus to preserve the privacy on social network secrecy while publishing of social network information and
many users prefer to hide their real identity this is caused by analyze the feasible issues in these important categories
using Anonymization Technique. Anonymization is the privacy, background knowledge, and data utility.
technique where encryption or removal personal information Anonymization approach is based on clustering-based and
QI (Quasi Identifier Attributes) from the dataset so that to graph modification based. [2] Elena zheleva & Lise Getoor
whom data set is described remains protected. When datasets presents a potential privacy breach in ONS (online social
is released it is important to prevent data from unwanted networks) with some current privacy definitions and
disclosure and balance the usefulness and privacy of released techniques for maintaining the confidentiality of the users. [3]
dataset. To minimize the risk of direct access of Quasi Prateek Joshi and c. -c. Jay Kuo gives mathematical
Identifier attributes from the dataset, generalization and formulation as well as computational models privacy and
978-1-4799-6272-3/15/$31.00(c)2015 IEEE
security of social network data, and present the metrics for [9]. Anonymized view of the dataset prevents individual’s data
computing the total amount of privacy as well as security in from unwanted disclosure.
social network. [4] Meng-Cheng Wei and Jun-Lin Lin and,
presents a model for k-anonymity, to reduce the loss of
information during generalization process to anonymized the
data, To combine the same type of data in single group is
necessary; individual anonymous set of data and clustering
based k-Anonymity technique executes in O (n2/k) time.
Author practically compares their techniques with other
clustering based k-Anonymity techniques. [5] A.
Machanavajjhala, j. Gehrke and d. Kifer gives two types of
and attacks on k-Anonymized dataset and gives a new
powerful privacy approach called as “l-diversity”. [6] Fig. 1. Block Diagram of Complete Anonymization Process
Mingxun Yuan Lie Chen presents k-degree l-diversity
representation for the preservation of structural data as well
IV. IMPLEMENTATION
as personal label of individuals and creates a new
Aanonymization method by including noise nodes to bring on A. Creation of Data Set
a novel algorithm. [7] N. Mogre, g. Agrawal and p. Patil gives
status-of-art methods for privacy of the high dimensional For creation of the Social network dataset, MS Excel
database and admit a new slicing technique for high database used, In data set ten fields (attributes) are there.
dimensional data sets. [8] Dror j. Cohen and Tamir Tassa Attributes of data set are UID, E-Mail, Age, Gender, Mobile
gives the issues of privacy protection on social networks and No., Zip Code, Country, Relationship status, Income,
the goal is to get the Anonymizied view of data without Profession. From these attributes Age, Gender, Mobile No,
revealing any information about the individuals and the and Relationship Status are the Quasi Identifier attributes,
relationships in between the other individuals on social which are anonymized.
networks. Author starts with the centralized setting and
provides two forms of Anonymization algorithm based on
sequential clustering.