Fragmentation

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 17

Types of Fragmentation

Fragmentation
Fragmentation is the task of dividing a table into a set of smaller tables. The subsets of the table
are called fragments. Fragmentation can be of three types: horizontal, vertical, and hybrid
(combination of horizontal and vertical). Horizontal fragmentation can further be classified into
two techniques: primary horizontal fragmentation and derived horizontal fragmentation.

Fragmentation should be done in a way so that the original table can be reconstructed from the
fragments. This is needed so that the original table can be reconstructed from the fragments
whenever required. This requirement is called “reconstructiveness.”

Advantages of Fragmentation
 Since data is stored close to the site of usage, efficiency of the database system is
increased.

 Local query optimization techniques are sufficient for most queries since data is locally
available.

 Since irrelevant data is not available at the sites, security and privacy of the database
system can be maintained.

Disadvantages of Fragmentation
 When data from different fragments are required, the access speeds may be very high.

 In case of recursive fragmentations, the job of reconstruction will need expensive


techniques.

 Lack of back-up copies of data in different sites may render the database ineffective in
case of failure of a site.

Horizontal Fragmentation
Horizontal fragmentation groups the tuples of a table in accordance to values of one or more
fields. Horizontal fragmentation should also confirm to the rule of reconstructiveness. Each
horizontal fragment must have all columns of the original base table.
For example, in the student schema, if the details of all students of Computer Science Course
needs to be maintained at the School of Computer Science, then the designer will horizontally
fragment the database as follows −
CREATE COMP_STD AS
SELECT * FROM STUDENT
WHERE COURSE = "Computer Science";
Types of horizontal fragmentation
Horizontal Fragmentation has two variants as follows;

1. Primary Horizontal Fragmentation (PHF)

2. Derived Horizontal Fragmentation (DHF)

Primary Horizontal Fragmentation (PHF)

 Primary horizontal fragmentation is the process of fragmenting a single table, row wise
using a set of conditions.
 Primary Horizontal Fragmentation is about fragmenting a single table horizontally (row
wise) using a set of simple predicates (conditions).

Example:
Fragment 1

customer_id | Name | Area | Payment Type | Sex


1 | Bob | London | Credit card | Male
2 | Mike | Manchester | Cash | Male
Fragment 2

customer_id | Name | Area | Payment Type | Sex


3 | Ruby | London | Cash | Female
Another Example:

Account (Acc_No, Balance, Branch_Name, Type).

In this example if values are inserted in table Branch_Name as Pune, Baroda, Delhi.

The query can be written as:

SELECT*FROM ACCOUNT WHERE Branch_Name= “Baroda”


Example:

Acc_No Balance Branch_Name


A_101 5000 Pune
A_102 10,000 Baroda
A_103 25,000 Delhi

For the above table we can define any simple condition like, Branch_Name= 'Pune', Branch_Name=
'Delhi', Balance < 50,000

Fragmentation1:
SELECT * FROM Account WHERE Branch_Name= 'Pune' AND Balance < 50,000

Fragmentation2:
SELECT * FROM Account WHERE Branch_Name= 'Delhi' AND Balance < 50,000

What is simple predicate?

Given a table R with set of attributes [A1, A2… An], a simple predicate Pi can be expressed as follows;

Pi: Aj θ Value

Where θ can be any of the symbols in the set {=, <, >, ≤, ≥, ≠}, value can be any value stored in the table
for the attributed Ai. For example, consider the following table Account given in Figure 1:

Acno Balance Branch_Name

A101 5000 Mumbai

A103 10000 New Delhi

A104 2000 Chennai

A102 12000 Chennai

A110 6000 Mumbai

A115 6000 Mumbai

A120 2500 New Delhi

Figure 1: Account table

For the above table, we could define any simple predicates like, Branch_name = ‘Chennai’,
Branch_name= ‘Mumbai’, Balance < 10000 etc using the above expression “Aj θ Value”.
What is set of simple predicates?

Set of simple predicates is set of all conditions collectively required to fragment a relation into subsets.
For a table R, set of simple predicate can be defined as;

P = { P1, P2, …, Pn}

Example 1
As an example, for the above table Account, if simple conditions are, Balance < 10000, Balance ≥ 10000,
then,

Set of simple predicates P1 = {Balance < 10000, Balance ≥ 10000}

Example 2
As another example, if simple conditions are, Branch_name = ‘Chennai’, Branch_name= ‘Mumbai’,
Balance < 10000, Balance ≥ 10000, then,

Set of simple predicates P2 = { Branch_name = ‘Chennai’, Branch_name= ‘Mumbai’, Balance < 10000,
Balance ≥ 10000}

What is Min-term Predicate?

When we fragment any relation horizontally, we use single condition, or set of simple predicates to filter
the data. Given a relation R and set of simple predicates, we can fragment a relation horizontally as
follows (relational algebra expression);

Fragment, R i = σFi(R), 1 ≤ i ≤ n

where Fi is the set of simple predicates represented in conjunctive normal form, otherwise called as
Min-term predicate which can be written as follows;

Min-term predicate, Mi=P1 Λ P2 Λ P3 Λ … Λ Pn

Here, P1 means both P1 or ¬(P1), P2 means both P2 or ¬(P2), and so on. Using the conjunctive form of
various simple predicates in different combination, we can derive many such min-term predicates.

For the example 1 stated previously, we can derive set of min-term predicates using the rules stated
above as follows;

We will get 2n min-term predicates, where n is the number of simple predicates in the given predicate
set. For P1, we have 2 simple predicates. Hence, we will get 4 (2 2) possible combinations of min-term
predicates as follows;
m1  = {Balance < 10000 Λ Balance ≥ 10000}

m2  = {Balance < 10000 Λ ¬(Balance ≥ 10000)}

m3  = {¬(Balance < 10000) Λ Balance ≥ 10000}

m4  = {¬(Balance < 10000) Λ ¬(Balance ≥ 10000)}

Our next step is to choose the min-term predicates which can satisfy certain conditions to fragment a
table, and eliminate the others which are not useful. For example, the above set of min-term predicates
can be applied each as a formula Fi stated in the above rule for fragment Ri as follows;

Account1 = σBalance< 10000 Λ Balance ≥ 10000(Account)

this can be written in equivalent SQL query as,

Account1 <-- SELECT * FROM account WHERE balance < 10000 AND balance  ≥ 10000;

Account2 = σBalance< 10000 Λ ¬(Balance ≥ 10000)(Account)

this can be written in equivalent SQL query as,

Account2 <-- SELECT * FROM account WHERE balance < 10000 AND NOT balance  ≥ 10000;

where NOT balance  ≥ 10000 is equivalent to balance < 10000.

Account3 = σ¬(Balance< 10000) Λ Balance ≥ 10000(Account)

which can be written in equivalent SQL query as,

Account3 <-- SELECT * FROM account WHERE NOT balance < 10000 AND balance  ≥ 10000;

where NOT balance  < 10000 is equivalent to balance ≥ 10000.

Account4 = σ¬(Balance< 10000) Λ ¬(Balance ≥ 10000)(Account)

which can be written in eAccount4 <-- SELECT * FROM account WHERE NOT balance < 10000 AND NOT
balance  ≥ 10000;

where NOT balance  < 10000 is equivalent to balance ≥ 10000 and NOT balance  ≥ 10000 is equivalent


to balance < 10000. This is exactly same as the query for fragment Account 1.
From these examples, it is very clear that the first query for fragment Account1 (min-term predicate m1)
is invalid as any record in a table cannot have two values for any attribute in one record. That is, the
condition (Balance < 10000 Λ Balance ≥ 10000)  requires that the value for balance must both be less
than 10000 and greater and equal to 10000, which is not possible. Hence the condition violates and can
be eliminated. For fragment Account2 (min-term predicate m2), the condition is (balance<10000 and
balance<10000) which ultimately means balance<10000 which is correct. Likewise, fragment Account3 is
valid and Account4 must be eliminated. Finally, we use the min-term predicates m2 and m3 to fragment
the Account relation. The fragments can be derived as follows for Account;

SELECT * FROM account WHERE balance < 10000;

equivalent SQL query as,

Acno Balance Branch_Name

A101 5000 Mumbai

A104 2000 Chennai

A120 2500 New Delhi

A110 6000 Mumbai

A115 6000 Mumbai

SELECT * FROM account WHERE balance  ≥ 10000;

Account3

Acno Balance Branch_Name

A103 10000 New Delhi

A102 12000 Chennai

Correctness of Fragmentation
We have chosen set of min-term predicates which would be used to horizontally fragment a
relation (table) into pieces. Now, our next step is to validate the chosen fragments for their
correctness. We need to verify did we miss anything? We use the following rules to ensure that
we have not changed semantic information about the table which we fragment.

1. Completeness – If a relation R is fragmented into set of fragments, then a tuple (record) of R


must be found in any one or more of the fragments. This rule ensures that we have not lost any
records during fragmentation.

2. Reconstruction – After fragmenting a table, we must be able to reconstruct it back to its


original form without any data loss through some relational operation. This rule ensures that
we can construct a base table back from its fragments without losing any information. That is,
we can write any queries involving the join of fragments to get the original relation back.

3. Disjointness – If a relation R is fragmented into a set of sub-tables R 1, R2, …, Rn, a record


belongs to R1 is not found in any other sub-tables. This ensures that R1 ≠ R2.

The horizontal fragments using the above set of min-term predicates can be generated as follows;

Fragment 1: SELECT * FROM account WHERE branch_name = ‘Chennai’ AND balance ≥ 10000;

Fragment 2: SELECT * FROM account WHERE branch_name = ‘Chennai’ AND balance < 10000;

Fragment 3: SELECT * FROM account WHERE branch_name = ‘Mumbai’ AND balance ≥ 10000;

Fragment 4: SELECT * FROM account WHERE branch_name = ‘Mumbai’ AND balance < 10000;

The horizontal fragments using the above set of min-term predicates can be
generated as follows;
Fragment 1: SELECT * FROM account WHERE branch_name = ‘Chennai’ AND balance ≥ 10000;

Account1

Acno Balance Branch_Name

A102 12000 Chennai


Fragment 2: SELECT * FROM account WHERE branch_name = ‘Chennai’ AND balance < 10000;

Account2

Acno Balance Branch_Name

A102 2000 Chennai

Fragment 3: SELECT * FROM account WHERE branch_name = ‘Mumbai’ AND balance ≥ 10000;

Account3

Acno Balance Branch_Name

Fragment 4: SELECT * FROM account WHERE branch_name = ‘Mumbai’ AND balance < 10000;

Account4

Acno Balance Branch_Name

A101 5000 Mumbai

A110 6000 Mumbai

A115 6000 Mumbai

In the ACCOUNT table we have the third branch ‘New Delhi’, which was not specified in the set of simple
predicates. Hence, in the fragmentation process we must not leave the tuple with the value ‘New Delhi’.
That is the reason we have included the min-term predicates m 5 and m6 which can be derived as follows;

Fragment 5: SELECT * FROM account WHERE branch_name <> ‘Mumbai’ AND branch_name <>
‘Chennai’ AND balance ≥ 10000;

Account5
Acno Balance Branch_Name

A103 10000 New Delhi

Fragment 6: SELECT * FROM account WHERE branch_name <> ‘Mumbai’ AND branch_name <>
‘Chennai’ AND balance < 10000;

Account6

Acno Balance Branch_Name

A120 2500 New Delhi

Correctness of fragmentation:

Completeness: The tuple of the table Account is distributed into different fragments. No


records were omitted. Otherwise, by performing the union operation between all
the Account table fragments Account1, Account2, Account3, and Account4, we will be able to
get Account back without any information loss. Hence, the above fragmentation is Complete.

Reconstruction: As said before, by performing Union operation between all the fragments,
we will be able to get the original table back. Hence, the fragmentation is correct and the
reconstruction property is satisfied.

Disjointness: When we perform Intersect operation between all the above fragments, we will
get null set as result, as we do not have any records in common for all the fragments. Hence,
disjointness property is satisfied.

2) Derived horizontal fragmentation


Fragmentation derived from the primary relation is called as derived horizontal fragmentation.  
Example: Refer the example of primary fragmentation given above.

The following fragmentations are derived from primary fragmentation.

Fragmentation1:
SELECT * FROM Account WHERE Branch_Name= 'Baroda' AND Balance < 50,000

Fragmentation2:
SELECT * FROM Account WHERE Branch_Name= 'Delhi' AND Balance < 50,000

In Derived Horizontal Fragmentation we fragment a table based on the constraints defined on


another table. Both tables are linked with each other with the help of primary and foreign key
and must establish the Owner-Member relation.  We use the primary horizontal fragmentation
technique when we want to horizontally fragment a table which is not dependent on any other
owner table. But in most of the cases, we need to fragment a database as a whole. For
example, consider a relation which is connected with another relation using foreign key
concept. That is, whenever a record is inserted into the member table, the foreign key column
value of the inserted record must be verified for its availability in its own table. In such a
condition, we cannot fragment the parent table and the child table.

Owner Table in Derived Horizontal Fragmentation

Owner table is a parent table to which we apply the constraint.

Member Table in Derived Horizontal Fragmentation

Member table is a child table that can be fragmented but by following the constraints of the parent
table.

If we fragment the tables separately, then for every insertion of records the table must verify the
existence of one such value in the parent table. Hence, for this case, the Primary Horizontal
Fragmentation would not work.

Let’s consider an example, where an international university maintains the information about its
STUDENTs. They store information about the STUDENT in STUDENT table and the STUDENT addresses in
ADDRESS table as follows;

STUDENT( RollNo, Name, Marks, Country)

ADDRESS(RollNo, Address)

RollNo NAME MARKS COUNTRY


01 Fazal 22 IRAQ
02 Abdul 66 Italy
03 Sameed 77 UK
04 Shahzeb 90 China
05 Mumraiz 66 China
Table 1: STUDENT table

RollNo Address
01 City A, IRAQ
01 City A, UK
02 City D, Italy
02 City A, Pakistan
03 City D, IRAQ
04 City D, Iraq
04 City A, Pakistan
05 City B, China
Table 2: Address Table

If the organization would go for fragmenting the relation STUDENT on the Country attribute, it needs to
create 4 fragments using horizontal fragmentation as mentioned in table below;

STUDENT1 ROLLNO NAME MARKS COUNTRY


C001 Fazal 22 IRAQ
STUDENT2 ROLLNO NAME MARKS COUNTRY
C002 Abdul 66 Italy
STUDENT3 ROLLNO NAME MARKS COUNTRY
C010 Mumraiz 66 China
C004 Shahzeb 90 China
STUDENT4 ROLLNO NAME MARKS COUNTRY
C003 Sameed 77 UK
Table 3: Horizontal fragments of Table 1 on Country attribute
Now, it is necessary to fragment the second relation ADDRESS based on the fragment created in
STUDENT relation. The fragmentation of ADDRESS is done as follow as a set of semi-joins as follows.

ADDRESS1 = ADDRESS ⋉  STUDENT1

ADDRESS2 = ADDRESS ⋉  STUDENT2

ADDRESS3 = ADDRESS ⋉  STUDENT3

ADDRESS4 = ADDRESS ⋉  STUDENT4

This will result in four fragments of ADDRESS where the STUDENT address of all STUDENTs of fragment
STUDENT1 will go into ADDRESS1, and the STUDENT address of all STUDENTs of fragment STUDENT 2 will
go into ADDRESS2, and so on.
The resultant fragment of ADDRESS will be the following;

RollNo Address
01 City A, IRAQ
02 City A, UK
Table 4: Showing fragment 1 of “address” table

RollNo Address
02 City D, Italy
02 City A, Pakistan
Table 5: Showing fragment 2 of “address” table

RollNo Address
04 City D, Iraq
04 City A, Pakistan
05 City B, China
Table 6: Showing fragment 3 of “address” table

RollNo Address
03 City D, IRAQ
Table 7: Showing fragment 4 of “address” table

Checking the fragments for correctness in Derived Horizontal Fragmentation


Completeness: The completeness of a derived horizontal fragmentation is complex than primary
horizontal fragmentation. The reason for this complexity is because the predicates used are determining
the fragmentation of two different tables/relations. Formally, for fragmentation of two relations R and
T, such as {R1, R2, …, Rn} and {T1, T2, … , Tn}, there should be one common attribute such as A. Then, for
each tuple of relation Ri, there should be a tuple T i which has a common value for A. This concept is
called referential integrity concept.

The derived fragmentation of Address is complete. Because the value of the common attributes RollNo
for the fragments STUDENTi and Addressi are the same. For example, the value present in RollNo of
STUDENT1 is also and only present in Address1, etc.

Reconstruction: Reconstruction of the pre-existing tables is possible by a union operation.

Disjointness:  If the minterm predicates are mutually exclusive then the disjointness rule is satisfied for
Primary  Horizontal fragmentation.
Vertical Fragmentation
In vertical fragmentation, the fields or columns of a table are grouped into fragments. In order to
maintain reconstructiveness, each fragment should contain the primary key field(s) of the table. Vertical
fragmentation can be used to enforce privacy of data.

For example, let us consider that a University database keeps records of all registered students in a
Student table having the following schema.

STUDENT

Reg_no Name Course Address Semester Fees Marks

Now, the fees details are maintained in the accounts section. In this case, the designer will fragment the
database as follows −

CREATE TABLE STD_FEES AS

SELECT Reg_no, Fees

FROM STUDENT;

Vertical fragmentations are subset of attributes


Fragment 1

customer_id Name Area Sex

1 Bob London Male

2 Mike Manchester Male

3 Ruby London Female

Fragment 2

customer_id Payment Type

1 Credit card

2 Cash

3 Cash
Example:

Acc_No Balance Branch_Name


A_101 5000 Pune
A_102 10,000 Baroda
A_103 25,000 Delhi

Fragmentation1:
SELECT * FROM Acc_NO

Fragmentation2:
SELECT * FROM Balance

Complete vertical fragmentation

 The complete vertical fragmentation generates a set of vertical fragments, which can include all the
attributes of original relation.

 Reconstruction of vertical fragmentation is performed by using Full Outer Join operation on


fragments.

Hybrid Fragmentation
In hybrid fragmentation, a combination of horizontal and vertical fragmentation techniques are
used. This is the most flexible fragmentation technique since it generates fragments with
minimal extraneous information. However, reconstruction of the original table is often an
expensive task.

Hybrid fragmentation can be done in two alternative ways −


 At first, generate a set of horizontal fragments; then generate vertical fragments from
one or more of the horizontal fragments.

 At first, generate a set of vertical fragments; then generate horizontal fragments from
one or more of the vertical fragments.
 Hybrid fragmentation can be achieved by performing horizontal and vertical
partition together.
 Mixed fragmentation is group of rows and columns in relation .

Example: Consider the following table which consists of employee information.


Emp_ID Emp_Name Emp_Address Emp_Age Emp_Salary

101 Surendra Baroda 25 15000

102 Jaya Pune 37 12000

103 Jayesh Pune 47 10000

Fragmentation1:
SELECT * FROM Emp_Name WHERE Emp_Age < 40
Fragmentation2:
SELECT * FROM Emp_Id  WHERE Emp_Address=  'Pune' AND Salary <
14000

Reconstruction of Hybrid Fragmentation


The original relation in hybrid fragmentation is reconstructed by performing UNION and FULL
OUTER JOIN.

You might also like