Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 7

Proceedings of the International Conference , “Computational Systems and Communication Technology”

5TH MAY 2010 - by Einstein College of Engineering,


Tirunelveli-Tamil Nadu,PIN-627 012,INDIA

AN EFFICIENT MECHANISM FOR HANDLING INFERENCES IN


DATABASES
Asha Philip1, T.Samraj Lawrence2
1
Student II M.E.CSE, 2Lecturer,
Department of Computer Science & Engineering,
Francis Xavier Engineering College, Tirunelveli

ABSTRACT
Access control mechanisms are INTRODUCTION
insufficient to protect the sensitive data that Privacy is one of the important
resides in various data sources from indirect
research issues in building next generation
attacks.Users may access series of innocuous
information systems. The confidentiality
information by employing inference techniques
problem is the problem that is challenged
to derive sensitive data by using that
by the growing popularity of Social
information.To provide more security,an
inference detection system is developed.The Network Services such as Friendster,

objective is to prevent the malicious users Blogger and Myspace. People in societies
from infering the sensitive information through publishes personal profiles and reveal the
the data they are authorized to access. When social relations. Malicious users may be
multiple users poses various queries for able to infer such information. Most
infering the sensiive data,the detection system existing privacy protection techniques are
will examine their past history table. Based on inadequate in handling these aspects.
the acquired knowledge,Semantic Inference
Bayesian networks are used to model the
Model(SIM) is constructed to identify
social network so as to capture the causal
relationship among data and between the
relationship among data. Generalizing
data.Based on the SIM, the violation detection
from a single-user to a multi-user
system keeps track of a user’s query history.
The inference probability is calculated from collaborative system greatly increases the
previously posted queries.If the inference complexity of the inference detection
probability exceeds the prespecified threshold system. For example, one of the sensitive
then the current query request is denied. An attributes in the system can be inferred
example is given to illustrate the use of the from four different inference channels.
proposed technique to prevent multiple There are two collaborators and each
collaborative users from deriving sensitive
poses queries on two separate channels.
information via inference.
Based on individual inference violation
detection, neither of the users violates the
Keywords - Security and privacy
inference threshold from their query
protection, operating systems, software
engineering, inference engines, deduction answers. However, if the two users share

and theorem proving and knowledge information, then the aggregated


processing,. knowledge from the four inference
Proceedings of the International Conference , “Computational Systems and Communication Technology”
5TH MAY 2010 - by Einstein College of Engineering,
Tirunelveli-Tamil Nadu,PIN-627 012,INDIA
channels can cause an inference violation collaboration relation analysis. The
This motivates us to extend our research in Knowledge Acquisition module extracts
the multiple user case, where users may data dependency knowledge, data schema
collaborate with each other to jointly infer knowledge and domain semantic
sensitive data. knowledge. Based on the database schema
and data sources, data dependency
THE INFERENCE CONTROLLER between attributes within the same entity
FRAMEWORK and among entities are derived. A
semantic inference model module can be
This introduces a general constructed based on the acquired

framework for the inference detection knowledge. The Semantic Inference


Model (SIM) is a data model that
system, which includes the knowledge
represents all the possible relationships
acquisition module, semantic inference
among the attributes of the data sources.
model and violation detection module.
The Semantic Inference Graph
Knowledge Acquisition Module
(SIG)module can be constructed by
discusses how to acquire and represent instantiating the entities and attributes in
knowledge that could generate the SIM. For a given query, the SIG
inference channels. provides inference channels for inferring
sensitive information. The Violation
detection module combines the new query
request with the request log, and it checks
to see if the current request exceeds the
pre specified threshold of information
leakage.
The previous work on data
inference mainly focused on deriving
probabilistic data dependency,
relational database schema, and
Fig 1: Framework of inference system domain-specific semantic knowledge
and representing them as probabilistic
The proposed inference detection system inference channels in a SIM, proposing
(Fig.1) consists of three modules. They are an inference detection framework for
Knowledge acquisition, semantic
multiple collaborative users with static
inference model (SIM), and security
fields.. To remedy this shortcoming,
violation detection including user
we propose a probabilistic inference
Proceedings of the International Conference , “Computational Systems and Communication Technology”
5TH MAY 2010 - by Einstein College of Engineering,
Tirunelveli-Tamil Nadu,PIN-627 012,INDIA

approach to treat the query-time


inference detection problem. The
contribution of the paper consists of 1)
deriving probabilistic inference
channels in a SIM by making
adaptable changes 2) mapping the
instantiated SIM into a Bayesian
Fig 2: A SIM example for Airports,
network for efficient and scalable Runways, and Aircraft
inference computation, and 3)
proposing an inference detection For example, the semantic
framework for multiple collaborative knowledge “can land” between Runway
users with dynamic fields. and Aircraft implies that the length of
Runway should be greater than the

KNOWLEDGE ACQUISITION FOR minimum Aircraft landing distance, and

DATA INFERENCE the width of Runway should be greater

Since users may pose queries and than the minimum width required by

acquire knowledge from different sources, Aircraft. If we know the runway

we need to construct a SIM for the requirement of aircraft C-5, and C-5 “can

detection system to track user inference land” in the instance of runway r, then the

intention. The SIM requires the system to values of attributes length and width of r

acquire knowledge from data dependency, can be inferred from the semantic

database schema, and domain-specific knowledge. Therefore, we want to capture

semantic knowledge. Knowledge as extra the domain-specific semantic knowledge

inference channels in the SIM. as extra inference channels in the SIM.

SEMANTIC INFERENCE MODEL


A SIM consists of linking related Fig 3: The semantic link “can land”
attributes (structure) and their between “Aircraft_Min_Land_Dist”and
corresponding conditional probabilities “Runway_Length
(parameters). the links between attributes
are fixed is to be assumed and derive the For example, the semantic
conditional probability tables for each relation “can land” between Runway and
attribute. There are three types of relation Aircraft (Fig. 3) implies that the length of
links: dependency link, schema link and Runway is greater than the minimum
semantic link. This is shown in Fig 2. required Aircraft landing distance. Thus,
Proceedings of the International Conference , “Computational Systems and Communication Technology”
5TH MAY 2010 - by Einstein College of Engineering,
Tirunelveli-Tamil Nadu,PIN-627 012,INDIA
the source node is aircraft_min_ land_dist, probability values are calculated by
and the target node is runway_length. Both taking the average of probability
attributes can take three values: “short,” values of every attribute. The CPT for
“medium,” and “long.” First, we add the
the attribute “TAKEOFF_LANDING_
value “unknown” to source node
CAPACITY” summarizes its
aircraft_min_land_dist and set it as a
dependency on its parent nodes. The
default value. Then, we update the
conditional probabilities in the CPT
conditional probabilities of the target node
to reflect the semantic relationship. Here,
can be derived from the database

we assume that runway_length has an content. The CPT for runway_length is


equal probability of being short, medium, explained in Fig.4.
or long. When the source node is set to
“unknown,” the runway_length is Conditional probability of runway_length
Cond aircraft unkno small me lar
independent of air-craft_min_land_dist,
_min wn d
and when the source node has a known Runwa small 0.33 0.33 0 0
y_lengt med 0.33 0.33 0.5 0
value, the semantic relation “can land” lar 0.33 0.33 0.5 1
h
requires that runway_length is greater than
Fig 4: CPT of runway_length
or equal to aircraft_ min_land_dist

EVALUATING INFERENCE IN
CONDITIONAL PROBABILITY
SEMANTIC INFERENCE GRAPH
TABLE
For a given SIG, there are many
Conditional probability table feasible inference channels that can be
(CPT) represents a directed, acyclic formed via linking the set of dependent
graph which includes the link that are attributes. Therefore, we propose to map
directly influences to the data. The the SIG to a Bayesian network to reduce
conditional probability table is the computational complexity in
constructed by assigning default values evaluating the user inference probability

for each attribute such as small, for the sensitive attributes. The PRM is an
extension of the Bayesian network that
medium, large, wide, narrow.
integrates schema knowledge from
Information from the conditional
relational data sources. Specifically, PRM
probability tables are derived. If the
utilizes a relational structure to develop
query is used more frequently, the
dependency between related entities.
values of CPT will be changed. The Therefore, in PRM, an attribute can have
conditional probability table must be two distinct types of parent-child
updated with the queries. The dependencies—dependency within entity
Proceedings of the International Conference , “Computational Systems and Communication Technology”
5TH MAY 2010 - by Einstein College of Engineering,
Tirunelveli-Tamil Nadu,PIN-627 012,INDIA
and dependency between related entities— values will be changed. The conditional
which match the two types of dependency probability table must be updated with the
links in the SIM. newly posted query values. The
probability values are calculated by taking
INFERENCE VIOLATION the average of probability values of every
DETECTION FOR MULTIPLE attribute.The inference probability is
USERS calculated based on the conditional
Generalizing from the single-user probability table.Thus by calculating

collaborative system to the multiuser inference probability,we can identify


whether the inference probability is high
collaborative system greatly increases
or low.If inference probability is higher,
the complexity.This is related to
then the user is acquiring sensitive
collaboration effectiveness which
data.Otherwise the user is unable to
contains three parameter values.The
acquire sensitive data.
corresponding SIM for airport “LAX”
is shown in Fig5. NEW CONDITIONAL PROBABILITY
TABLE
The new conditional
probability table is constructed such
that if we want to make adaptable
changes to the old conditional
probability table.In new conditional
probability table,we can add a new
field,remove a field,update the
field.Thus by providing more updates
we are providing more security for
sensitive information.
HISTOGRAM
The histogram is used to
Fig 5: The SIM for a transportation
represent the relationship between the
mission planning example.
given attributes.It represents the level

INFERENCE CALCULATION of inference and how much level of


Information from the conditional data is inferred by the user.Inference
probability tables are derived. If the query level of histogram is used to indicate
is used more frequently, the inference
Proceedings of the International Conference , “Computational Systems and Communication Technology”
5TH MAY 2010 - by Einstein College of Engineering,
Tirunelveli-Tamil Nadu,PIN-627 012,INDIA

how much the user has tried to infer where the threshold cannot be further
the data. lowered to satisfy the sensitivity
constraints, we can block the access to the

BAYESIAN NETWORK closest attribute to the security node on the

It is very simple, graphical most sensitive inference channel so that


the accessible nodes on that inference
representation for conditional
channel are less sensitive to the threshold
independence assertions.Bayesian
of the security node.
networks provide a natural
CONCLUSION
representation for conditional
In this paper, we present a
independence. A Bayesian network is a technique that prevents users from
graph that contains a set of random inferring sensitive information from a
variables,a set of directed links series of seemingly innocuous queries.We
connects pairs of nodes. Sensitivity extract the relationship among the various
analysis of attributes in the Bayesian data & constructed a SIM. To reduce the
network is performed for studying the computation complexity for inference,

sensitivity of the inference channels. It Bayesian network can be used for


evaluating the inference probability. For
will reveals that the nodes that are
inference violation detection, we
closer to the security node have
developed a collaborative inference model
stronger inference effects on the
to derive the collabora-tive inference of
security node. Thus, a sensitivity
sensitive information. Sensitivity analysis
analysis of these close nodes can assist in the Bayesian network is performed for
domain experts to specify the threshold studying the sensitivity of the inference
of the security node to ensure its channels. It will reveals that the nodes that
robustness. are closer to the security node have
SENSITIVITY ANALYSIS stronger inference effects on the security
Data administrator proposes a node. Thus, a sensitivity analysis of these
threshold value based on the required close nodes can assist domain experts to
protection level, he/she can check the specify the threshold of the security node
sensitivity values of the closest attributes to ensure its robustness.
on inference channels. If one of these REFERENCES
inference channels is too sensitive,which
means that a small change in the attribute [1] K. Aberer and Z. Despotovic , “
value can result in exceeding the Managing Trust in a Peer –2 - Peer
threshold, then the threshold needs to be Information System , ” Proc. 10th ACM
tightened to make it less sensitive. In cases Int’l Conf. Information and Knowledge
Proceedings of the International Conference , “Computational Systems and Communication Technology”
5TH MAY 2010 - by Einstein College of Engineering,
Tirunelveli-Tamil Nadu,PIN-627 012,INDIA
Management (CIKM ’01) ,Oct. 2001.

[2] M. Chavira , D. Allen , and A.Darwiche


, “ Exploiting Evidence in Probabilistic
Inference , ” Proc. 21st Conf. Uncertainty
in ArtificialIntelligence (UAI ’05) , pp.
112-119, 2005.

[3] Y. Chen and W.W. Chu, “Database


Security Protection via Inference
Detection ,” Proc. Third IEEE Int’l Conf.
Intelligence and Security Informatics (ISI
’06), 2006.

[4] M. Chavira and A. Darwiche ,


“Compiling Bayesian Networks
with Local Structure,” Proc. 19th
Int’l Joint Conf. Artificial
Intelligence (IJCAI ’05), pp.
1306-1312, 2005.
[5] A. Darwiche, Class Notes for CS262A:
Reasoning with Partial Beliefs.Univ. of
California, Los Angeles, 2003.

[6] C.J. Date, An Introduction to Database


Systems, sixth ed.Addison-Wesley, 1995.

You might also like