Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 48

Current Trends in Data Security

Dan Suciu

Joint work with Gerome Miklau


1

Data Security
Dorothy Denning, 1982:
Data Security is the science and study of methods of protecting data (...) from unauthorized disclosure and modification Data Security = Confidentiality + Integrity
2

Data Security
Distinct from systems and network security
Assumes these are already secure

Tools:
Cryptography, information theory, statistics,

Applications:
An enabling technology

Outline
Traditional data security
Two attacks Data security research today Conclusions
4

Traditional Data Security


Security in SQL = Access control + Views Security in statistical databases = Theory

[Griffith&Wade'76, Fagin'78]

Access Control in SQL


GRANT privileges ON object TO users [WITH GRANT OPTIONS]
privileges = SELECT | INSERT | DELETE | . . . object = table | attribute

REVOKE privileges ON object FROM users [CASCADE ]

Views in SQL
A SQL View = (almost) any SQL query
Typically used as:
CREATE VIEW pmpStudents AS SELECT * FROM Students WHERE

GRANT SELECT ON pmpStudents TO DavidRispoli


7

Summary of SQL Security


Limitations: No row level access control Table creator owns the data: thats unfair !
Access control = great success story of the DB community...

or spectacular failure: Only 30% assign privileges to users/roles


And then to protect entire tables, not columns
8

Summary (cont)
Most policies in middleware: slow, error prone:
SAP has 10**4 tables GTE over 10**5 attributes A brokerage house has 80,000 applications A US government entity thinks that it has 350K

Today the database is not at the center of the policy administration universe
[Rosenthal&Winslett2004]
9

[Adam&Wortmann89]

Security in Statistical DBs


Goal: Allow arbitrary aggregate SQL queries Hide confidential data
SELECT count(*) OK FROM Patients WHERE age=42 and sex=M and diagnostic=schizophrenia SELECT name FROM Patient WHERE age=42 and sex=M and diagnostic=schizophrenia
10

[Adam&Wortmann89]

Security in Statistical DBs


What has been tried: Query restriction
Query-size control, query-set overlap control, query monitoring None is practical

Data perturbation
Most popular: cell combination, cell suppression Other methods, for continuous attributes: may introduce bias

Output perturbation
For continuous attributes only
11

Summary on Security in Statistical DB


Original goal seems impossible to achieve Cell combination/suppression are popular, but do not allow arbitrary queries

12

Outline
Traditional data security
Two attacks Data security research today Conclusions
13

[Chris Anley, Advanced SQL Injection In SQL]

SQL Injection
Your health insurance company lets you see the claims online: First login: User: Password: Now search through the claims : Search claims by: Dr. Lee fred

********

SELECTFROMWHERE doctor=Dr. Lee and patientID=fred 14

SQL Injection
Now try this:

Search claims by:

Dr. Lee OR patientID = suciu; --- and patientID=fred

..WHERE doctor=Dr. Lee OR patientID=suciu;

Better: Search claims by:

Dr. Lee

OR 1 = 1; -15

SQL Injection
When youre done, do this:

Search claims by:

Dr. Lee; DROP TABLE Patients; --

16

SQL Injection
The DBMS works perfectly. So why is SQL injection possible so often ?
Quick answer:
Poor programming: use stored procedures !

Deeper answer:
Move policy implementation from apps to DB
17

Latanya Sweeneys Finding


In Massachusetts, the Group Insurance Commission (GIC) is responsible for purchasing health insurance for state employees GIC has to publish the data:
GIC(zip, dob, sex, diagnosis, procedure, ...)
18

Latanya Sweeneys Finding


Sweeney paid $20 and bought the voter registration list for Cambridge Massachusetts:

GIC(zip, dob, sex, diagnosis, procedure, ...) VOTER(name, party, ..., zip, dob, sex)
19

Latanya Sweeneys Finding


zip, dob, sex
William Weld (former governor) lives in Cambridge, hence is in VOTER 6 people in VOTER share his dob only 3 of them were man (same sex) Weld was the only one in that zip Sweeney learned Welds medical records !
20

Latanya Sweeneys Finding


All systems worked as specified, yet an important data has leaked

How do we protect against that ?


Some of todays research in data security address breaches that happen even if all systems work correctly
21

Summary on Attacks
SQL injection: A correctness problem:
Security policy implemented poorly in the application

Sweeneys finding: Beyond correctness:


Leakage occurred when all systems work as specified

22

Outline
Traditional data security
Two attacks Data security research today Conclusions
23

Research Topics in Data Security


Rest of the talk: Information Leakage Privacy Fine-grained access control Data encryption Secure shared computation
24

[Samarati&Sweeney98, Meyerson&Williams04]

Definition: each tuple is equal to at least k-1 others

Information Leakage: k-Anonymity


Last Stone Reyser R* Stone Ramos R* Age 30-50 34 20-40 36 30-50 47 20-40 22 Race Afr-Am Cauc * Afr-am Hisp *
25

Anonymizing: through suppression and generalization

First Harry * John Beatrice * John

Hard: NP-complete for supression only Approximations exists

[Miklau&S04, Miklau&Dalvi&S05,Yang&Li04]

Information Leakage: Query-view Security


Have data:

TABLE Employee(name, dept, phone)

View(s) Disclosure ? V(name,phone) total V1(name,dept) big S(name,phone) V2(dept,phone) S(name) V(dept) tiny S(name) V(name) none where dept=HR where dept=RD

Secret Query S(name)

26

Summary on Information Disclosure


The theoretical research:
Exciting new connections between databases and information theory, probability theory, cryptography [Abadi&Warinschi05]

The applications:
many years away
27

Privacy
Is the right of individuals to determine for themselves when, how and to what extent information about them is communicated to [Agrawal03] others
More complex than confidentiality

28

Privacy
Involves: Data Owner Requester Purpose Consent
Example: Alice gives her email to a web service

alice@a.b.com

Privacy policy: P3P


29

Hippocratic Databases
DB support for implementing privacy policies. Purpose specification Hippocratic DB Consent Limited use alice@a.b.com Limited retention
Protection against: Sloppy organizations Malicious organizations Privacy policy: P3P
[Agrawal03, LeFevrey04]
30

Privacy for Paranoids


Idea: rely on trusted agents
alice@a.b.com aly1@agenthost.com

Agent Protection against: Sloppy organizations Malicious attackers


lice27@agenthost.com

foreign keys ?
[Aggarwal04]
31

Summary on Privacy
Major concern in industry
Legislation Consumer demand

Challenge:
How to enforce an organizations stated policies
32

Fine-grained Access Control


Control access at the tuple level.
Policy specification languages Implementation

33

Policy Specification Language


No standard, but usually based on parameterized views. CREATE AUTHORIZATION VIEW PatientsForDoctors AS SELECT Patient.* FROM Patient, Doctor WHERE Patient.doctorID = Doctor.ID and Doctor.login = %currentUser

Context parameters
34

Implementation
SELECT Patient.name, Patient.age FROM Patient WHERE Patient.disease = flu

SELECT Patient.name, Patient.age FROM Patient, Doctor WHERE Patient.disease = flu and Patient.doctorID = Doctor.ID and Patient.login = %currentUser

e.g. Oracle

35

Two Semantics
The Truman Model = filter semantics
transform reality ACCEPT all queries REWRITE queries Sometimes misleading results

SELECT count(*) FROM Patients WHERE disease=flu

The non-Truman model = deny semantics


reject queries ACCEPT or REJECT queries Execute query UNCHANGED May define multiple security views for a user

[Rizvi04]

36

Summary of Fine Grained Access Control


Trend in industry: label-based security Killer app: application hosting
Independent franchises share a single table at headquarters (e.g., Holiday Inn) Application runs under requesters label, cannot see other labels Headquarters runs Read queries over them

Oracles Virtual Private Database


37

[Rosenthal&Winslett2004]

Data Encryption for Publishing


Scientist wants to publish medical research data on the Web
Users and their keys:
All authorized users: Patient: Doctor: Nurse: Administrator : Kuser Kpat Kdr Knu Kadmin

Complex Policies:

Doctor researchers may access trials Nurses may access diagnostic Etc
38

What is the encryption granularity ?

[Miklau&S.03]

Data Encryption for Publishing


An XML tree protection:
Doctor: Nurse:

Kuser, Kdr Kuser, Knu

Nurse+admin: Kuser, Knu, Kadm

<patient>

Kuser Kdr
<trial> flu

Kpat (KnuKadm)
<privateData>

Knu Kdr
<diagnostic>

Kpat
<name> <age> 28 <address> Seattle

Kmaster
<drug> Tylenol

Kmaster
<placebo>
39 Candy

JoeDoe

Summary on Data Encryption


Industry:
Supported by all vendors: Oracle, DB2, SQL-Server Efficiency issues still largely unresolved

Research:
Hard theoretical security analysis
[Abadi&Warinschi05]
40

Secure Shared Processing


Alice has a database DBA Bob has a database DBB How can they compute Q(DBA, DBB), without revealing their data ? Long history in cryptography Some database queries are easier than general case
41

[Agrawal03]

Secure Shared Processing


Alice a b c d Task: find intersection without revealing the rest Bob c d e

Compute one-way hash h(a) h(b) h(c) h(d) h(c) h(d) h(e) Exchange h(c) h(d) h(e)

h(a) h(b) h(c) h(d) Whats wrong ?

42

[Agrawal03]

Secure Shared Processing


Alice a b c d EA

commutative encryption: h(x) = EA(EB(x)) = EB(EA(x))

Bob c d e EB EB(c) EB(d) EB(e)

EA(a) EA(b) EA(c) EA(d)


EB(c) EB(d) EB(e) EA h(c) h(d) h(e) h(a) h(b) h(c) h(d)

EA(a) EA(b) EA(c) EA(d) EB h(a) h(b) h(c) h(d)


43 h(c) h(d) h(e)

Summary on Secure Shared Processing


Secure intersection, joins, data mining But are there other examples ?

44

Outline
Traditional data security
Two attacks Data security research today Conclusions
45

Conclusions
Traditional data security confined to one server
Security in SQL Security in statistical databases

Attacks possible due to:


Poor implementation of security policies: SQL injection Unintended information leakage in published data

46

Conclusions
State of the industry:
Data security policies: scattered throughout applications Database no longer center of the security universe Needed: automatic means to translate complex policies into physical implementations

State of research: data security in global data sharing


Information leakage, privacy, secure computations, etc. Database research community has an increased appetite for cryptographic techniques

47

Questions ?

48

You might also like