Professional Documents
Culture Documents
Big Data-A Study of Security and Privacy Challenges
Big Data-A Study of Security and Privacy Challenges
PRIVACY CHALLENGES
Vijaya Kumar KS
Tavant Technologies
Bangalore, India
vijaykalanji@gmail.com
I. INTRODUCTION, CHALLENGES
In Todays world we deal with diverse set of data. Users
generate contents by accessing social networks, huge servers
log information about their activity into log files. These data
come from heterogeneous sources.
3)
4)
5)
6)
7)
8)
9)
10)
the audit layer will take the requests of the auditors, and give
back the auditor the required information.
Besides these, many organizations are developing auditing
tools on top of Hadoop. For instance Cloudera has come up
with its own data visualization and auditing tool. IBM has
come up with IBM InfoSphere Guardium[18] on top Hadoop.
10) Data provenance: The term data provenance refers to
the origin and creation of the data. Digital data is often copied
and transferred to other systems before it reaches the intended
destination. Hence, it is very difficult to identify the original
source of the data. Data provenance is very important as it
enables us to evaluate the quality and trust in data among other
things. In the context of Big data, provenance is called Big
Provenance[19]. It is a new field within Big data that is yet to
be explored further.
One way to identify the provenance of the data is to use a
rule engine to identify the data related to provenance in the
logs. But this approach has the limitation that we need to sift
through a large volume of data. Work is in progress to build a
modified version of Hadoop suiting provenance need. This
tool is called as HadoopProv[20]. This tool captures data
provenance at the record level. It has been designed to have
minimal temporal overhead of capturing provenance
information in MapReduce jobs. HadoopProv has an overhead
below 10% on typical job runtime. Additionally, it is
demonstrated that provenance queries are serviceable in O (k
log n), where n is the number of records per Map task and k is
the set of Map tasks in which the key appears.
IV. CONCLUSION
In this paper I have made an effort to explain what big data
is, and its importance in general and security and privacy
issues related to big data in particular. All the major areas
related to privacy and security are covered and the current
work in the respective areas has been mentioned. Different
components, tools/techniques that are used to address these
issues are also mentioned. Though the treatment of the subject
is not in depth, an attempt has been made to provide a
meaningful discussion of the topic. I hope that this will serve
as a good starting point for other researchers in this area.
REFERENCES
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[1]
[2]
[3]
[4]
[23]
[24]
[25]
[26]