National Institute of Technology Karnataka, Surathkal: A Project Report On

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 9

National Institute of Technology Karnataka, Surathkal

A Project Report on

“AN ENHANCED GROUPING ALGORITHM


FOR VERTICAL PARTITIONING PROBLEM IN DDBs ”

Under the Guidance of:

Prof. Ananthanarayana V. S.

Submitted by:

Mr. Buddharaj Ambhore (10IT04F)


[M.Tech(IT)]

Department of Information Technology,


National Institute of Technology Karnataka, Surathkal
Abstract

Distribution design involve making decisions on the fragmentation


and allocation of data across the sites of a computer network. Vertical partitioning
is the process of subdividing the attributes of a relation to generate fragments. In
this paper, we propose an enhancement to our previous work for vertical
partitioning algorithm using grouping approach.

This algorithm starts from the attribute affinity matrix and generates
initial groups based on the affinity values between attributes. Then, it
attempts to merge the initial groups to produce final groups that will represent
thefragments.
Contents

1. Introduction

2. Related Work

3. Enhancement to the Grouping Algorithm


4. Description of the enhanced algorithm

5. Implementation Details

6. Conclusion

7. References
1. Introduction

Distributed and parallel processing on database management systems (DBMS) is


an efficient way of improving performance of applications that manipulate large
volumes of data. This may be accomplished by removing irrelevant data accessed
during the execution of queries and by reducing the data exchange among sites,
which are the two main goals of the design of distributed databases. The primary
concern of distributed database systems5 is to design the fragmentation and
allocation of the underlying database. The distribution design 5involves making
decisions on the fragmentation and placement of data across the sites of a computer
network. The first phase of the distribution design in a top-down approach is the
fragmentation phase, which is the process of clustering into fragments the
information accessed simultaneously by applications. The fragmentation phase is
then followed by the allocation phase, which handles the physical storage of the
generated fragments among the nodes of a computer network, and the replication
of fragments.

2. Related Work

Most of the vertical fragmentation algorithms have started from


constructing an attribute affinity matrix from the attribute usage matrix: the
Attribute affinity matrix is an n x n matrix for the n-attribute
problem whose (i, j) element equals the “between attributes” affinity which is the
total number of accesses of transactions referencing both attributes i and j. An
iterative binary partitioning method has been used based on first clustering the
attributes and then applying empirical objective functions or mathematical cost
functions to perform the fragmentation. Attribute partitioning and attribute
clustering has discussed the implementation of a self-reorganizing database
management system that carries out attribute clustering. They also show that in a
database management system where storage cost is low compared to the cost of
accessing the subfiles, it is beneficial to cluster the attributes, since the increase in
storage cost will be more than offset by the saving in access cost. Hoffer
developed a non-linear, zero-one program, which minimizes a linear combination
of storage, retrieval and update costs, with capacity constraints for each file.
Navathe and Ra have developed a new algorithm based on a graphical technique .
This algorithm starts from the attribute affinity matrix by considering it as a
complete graph called the “affinity graph” in which an edge value represents the
affinity between the two attributes, and then forms a linearly connected spanning
tree. The algorithm generates all meaningful fragments in one iteration by
considering a cycle as a fragment. A linearly connected tree has only two ends. By
a “linearly connected tree” we imply a tree that is constructed by including one
edge at a time such that only edges at the “first” and the “last” node of the tree
would be considered for inclusion. We then form “affinity cycles” in this spanning
tree by including the edges of high affinity value around the nodes and “growing”
these cycles as large as possible. After the cycles are formed, partitions are easily
generated by cutting the cycles apart along “cut-edges”. The major disadvantage of
this algorithm is the relative complexity involved in implementation. In this paper,
however, we propose a new and simple algorithm with less implementation steps
to reach the desired results.

3. Enhancement to the Grouping Algorithm


In we started from the attribute affinity matrix (AAM) generated from
the attribute usage matrix (AUM) by considering it as a complete group. Figure 1
shows an attribute usage matrix for a relation
containing 10 attributes with respect to 8 queries (Q1, Q2,…, Q8) that are initiated
by the applications. Attribute Affinity Matrix measures the bond between two
attributes of a relation according to how they are
accessed by applications. Attribute affinity between attributes i and j is defined as:
Affij = Σk€Q acckij Figure 2 shows an attribute affinity matrix derived from the
attribute usage matrix below. The two factors we added in the enhanced version of
our algorithm are:
• Attributes Link Factor (ALF): We added this factor to avoid having poor
grouping between two (or more) attributes in step 1 of [1]. We are
using this factor in the formula: aff(i,j) ≥ P (Ai)* ALF/100 that should be true .
• Groups Link Factor (GLF): We added this factor to avoid having poor grouping
between two groups in step 2 of [1]. Here we have two
scenarios: First; If we want to connect attributes Ai in group k to an independent
attribute Aj, then the condition aff(i,j) ≥ P(gk)*GLF/100 must be true; Second; If
we want to connect attributes Ai in group k to attribute Aj in group l, then the
condition P (gl) ≥ P (gk)* GLF/100 must be satisfied.

4. Description of the enhanced algorithm

Step 1. Iterate starting from the first attribute (first row in affinity matrix) trying to
generate a group by joining it to other attribute(s) with the highest affinity value
(Max(aff(i, j)) forming the first initial group. The resulted group will have a power
factor P(g) that takes the affinity value aff(i, j). Here we have three possible
scenarios: First; the two attributes are independent (do not belong to any initial
group), in this case we perform a direct grouping if the selected highest affinity
value aff(i, j) ≥ P(Ai) * ALF/100. Second; one of the attributes i or j belongs to a
group k, in this case we join the independent attribute to group k if the condition
aff(i,j) ≥ P(gk) is true. Third; having attribute Ai in group k and attribute Aj in
group l, then we will join the two groups if P(gk) = P(gl). By the end of this step
we end up having all possible initial groups.
Step2. Iterate starting from the first initial group produced in step 1, trying to
search for “best extension”. At this step we have two possible
scenarios: First; the “best extension” connects attribute Ai in group k and attribute
Aj that has not been joined to any initial group in step 1, in this case the
independent attribute Aj will be joined to group k if the condition aff(i,j) ≥ P (gk) *
GLF/100 is true, then the extended group’s
power will be equal to aff(i,j) value. Second; the “best extension” connects
attribute Ai in group k and attribute Aj in group l, in this case
we need to ensure that the two conditions aff(i, j) ≥ P(gk) * GLF/100 and P (gl) ≥
P (gk) * GLF/100 are true. The new group’s power will be equal to the power of
group l.
We will keep repeating this last step until there is no possible “best
extension” found, and then we will be obtaining the final groupings of our
algorithm.
5. Implementation Details
The implementation Language of the enhanced grouping algorithm is in (core)
JAVA. The aim of the algorithm is to divide the attributes of the relation. This can
be shown as follows.

Input:

Here user has to specify the number of attributes in the relations and usage matrix.
Then user has to input the access frequency for each query. Once the user press
enter button, internally it first computes all the attribute affinity matrix which will
going to used in dividing the attributes in groups.
Output:-
By using the Usage matrix and Access frequency for each query Attribute
Affinity Matrix is calculated. After that algorithm tries to find out all possible
groups starting from first row and first column and search for the attributes who
satisfy the condition for Attribute Link Factor and group that attributes as shown. It
also calculates the group power for each possible groups in a relation. Finally it
performs the group extension to get the final groups with their group power and
we get the output in the form of portioned attributes. The output of the enhanced
algorithm is as follows:
6. Concluion

This algorithm is more flexible compared to our previous Grouping Algorithm and
more efficient for vertical partitioning problem because the added factors provided
more control on the final produced groups
based on the problem specifications. The major advantage of this algorithm is that
it is simple to understand and easy to implement (only two steps).
Our final results using the 10 and 20 attributes examples were identical to that
obtained by Navathe et al and Navathe & Ra’s Graphical algorithm but with better
performance and more flexibility.
This algorithm is more efficient for vertical partitioning problem because it
eliminates the deficiencies of binary partitioning and the complexity of graphical
algorithm. Finally, we note that the values for the enhancement factors are chosen
based on several qualitative and quantitative issues, such as; the network
bandwidth, number of sites, number of attributes in a relation, the
queries/transactions frequency and their type (retrieval or update), the nature of the
distributed
database management system used (heterogeneous or homogeneous), and lastly the
person who designs and
tests the results.

7. References

[1] M. AlFares et al, “Vertical Partitioning for Database Design: A


Grouping Algorithm”, ISCA 16th International Conference
on Software Engineering and Data Engineering (SEDE-2007), Las
Vegas, Nevada, USA, July 19-11, 2007, 218-223.

You might also like