L03 Data Control

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

CT80A0000 – DATA-INTENSIVE SYSTEMS

DISTRIBUTED DATA CONTROL


Lecture

Jiri Musto, D.Sc.


SEMANTIC DATA CONTROL
Objective is to make sure that authorized users perform correct operations on the
database
Aim to maintain database integrity
Methods for data control:
1. View management
2. Security control
3. Integrity control

3
VIEWS
A view is a (virtual) relation generated from a base relation
In SQL, a SELECT-query is stored as a VIEW

Views can be queried like normal relations


Main benefit of views:
Simplify complex queries by using views
Restrict access to data with views
Prevent accidental modification of data (view updates are heavily restricted)

4
VIEWS IN DISTRIBUTED DBMS
Views might be derived from fragments.

Views might be costly if base relations are distributed

You can use materialized views


Instead of a virtual view, the view is cached as a table
Used in practice to have local access instead of remote access
Used to speed up processes
Useful if having small number of data or query requires significant processing

5
TAKEN FROM SNOWFLAKE INC.

Performance Security Simplifies Supports Uses Storage


Benefits Benefits Query Logic Clustering
Regular table ✔

Regular view ✔ ✔

Cached query ✔
result
Materialized view ✔ ✔ ✔ ✔

6
VIEW REFRESHING
1. Immediate mode
When underlying data is updated
View is always consistent but increases update times

2. Deferred mode (preferred in practice)


Update views separately
Can be triggered at different times
▪ Lazily: Before querying a view
▪ Periodically: Every hour, every day, etc.
▪ Forcedly: After a certain number of updates

Views can be updated either


Fully
Incrementally

7
INCREMENTAL VIEW MAINTENANCE
You can utilize:
1. Counting algorithm
• Count the changes done to the base relation to update the views

2. Data skew
• Use different join plans based on differences in the amount of data and changes
• Rebalance

8
DATA SECURITY
Data protection
Prevents the content of data to be understood by unauthorized users
Encryption / decryption

Access control
Prevent the access and operations of data by unauthorized users
Discretionary access control (DAC)
▪ Authorization granted to subject within the database

Mandatory access control (MAC)


▪ Access control by the database based on the sensitiveness of data
▪ Done with security labels (e.g. LBAC)

9
DISCRETIONARY ACCESS CONTROL
Main actors
Subjects (users, groups of users) who execute operations
Operations (queries, functions, procedures)
Objects (data attribute, entity, row, table, etc.)

Check whether a subject may perform an operation on an object


In SQL: defined using GRANT OR REVOKE

Can have one admin class or multiple


Multiple is flexible but recursive authorization may be problematic

10
MANDATORY ACCESS CONTROL
Often associated with Multi-level access control (MLS)
Different security levels (clearances)
Top Secret > Secret > Confidential > Unclassified

Access controlled by 2 rules:


No read up
▪ Can only read data that is within or below users’ clearance level
▪ This is to avoid having access to data that is beyond one’s clearance level
No write down:
▪ Cannot write data below users’ clearance level
▪ This is to avoid accidentally sharing higher level data with lower level security

11
MAC IN RELATIONAL DB
A relation can be classified at different levels:
Relation: all tuples have the same clearance
Tuple: every tuple has its own clearance
Attribute: every attribute has its own clearance

A classified relation is thus multilevel


Appears differently (with different data) to subjects with different clearances

12
https://www.ekransystem.com/en/blog/mac-vs-dac
13
PROS AND CONS
PROS CONS
DAC Flexible Low level of protection
Easy to maintain User privilege overlap
User-friendly Access management
Granular maintenance
MAC High level of protection Difficult to scale
Granularity Manual maintenance
Fewer errors

https://www.ekransystem.com/en/blog/mac-vs-dac

14
ADDITIONAL PROBLEMS IN A DISTRIBUTED
ENVIRONMENT
Remote user authentication
Usually done using a service
The service should be replicated to multiple sites

Management of DAC rules


Problem if users’ group can span multiple sites
Rules stored at some directory based on user groups’ location
Accessing rules may incur remote queries

MAC covert channels


Accessing unauthorized data indirectly (query may contain data that is on a higher security level than
what the database contains)

15
SEMANTIC INTEGRITY CONTROL
Maintain database consistency
Structural constraints
Unique keys, primary and foreign keys, etc.

Behavioral constraints
Dependencies in the relational model

Two components
Integrity constraint specification
Integrity constraint enforcement

16
DIFFERENT CONSTRAINTS
Data model related (predefined)
Primary, foreign, unique key
Attribute types, not-null
Functional dependency between data

Update related (precompiled)


NEW and OLD tuples
Value restrictions

Methods:
IF and CHECK conditions

17
INTEGRITY ENFORCEMENT
1. Detection
• Execute a given query
• Detect an inconsistency across the databases
• Fix or undo

2. Preventive
• Execute a query only if the data will stay consistent within the databases
• Need to determine what state counts as consistent (based on update rules)
• Can use various methods in relational databases, for example:
• CHECK conditions
• ASSERTIONS (check condition on multiple tables)
• TRIGGERS

18
PROBLEMS WITH DISTRIBUTED INTEGRITY
CONTROL
Definition of constraints
How are the constraints defined?

Need to consider what fragments exist

Where to store
Where are the constraints stored?

How is data replicated across databases that have constraints?

Enforcement
Where should the constraints be enforced to minimize costs or increase efficiency?

19
CONCLUSION
Solutions initially designed for centralized systems
Extended for distributed systems
Materialized views and group-based discretionary access control

Semantic integrity control has received less attention


Generally not well supported by distributed DBMS

Full data control is more complex and costly in distributed systems


Definition and storage of the rules (site selection)
Design of enforcement algorithms which minimize communication costs

20

You might also like