The Effect of Number of Agents On Optimization of Adaptivity Join Queries in Heterogeneous Distributed Databases

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Computer Science Section

The Effect of Number of Agents on Optimization of adaptivity Join Queries in


Heterogeneous Distributed Databases
1
Mohammad-Reza FEIZI-DERAKHSHI, 2Hasan ASIL, 3Amir ASIL, 4Elnaz ZAFARANI
1
Department of Computer, University of Tabriz, Tabriz, 2,3Islamic Azad University, Azarshahr Branch, Azarshahr,
4
Islamic Azad University, Tabriz Branch, Tabriz, Iran,
2
h.asil@iauazar.ac.ir

Abstract–Distributed systems signify data distribution, (A) Incorporating feedback from previous query
association of activities, and controlling the distributed executions for better selectivity/cardinality
components of the system. Distributed systems are mostly used to (B) Parametric techniques to systematically postpone
share the workload or transfer data processing functions to a making certain decisions as late as possible
place nearer to those functions. This important task should be
mentioned in database query optimization. The growing need for
(C) Least excepted cost and by optimization techniques
optimizing query processing in databases has given rise to many which dispenses the possibility
methods of doing this. This article provides a multi-agent system These techniques based on static data details keeping and
for heterogeneous distributed databases by combining own limited amplitude but there are other techniques which
optimization techniques for processing queries in databases and propose new ways via adaptive techniques. The followings
adaptivity. In this system the effect of the number of agents on are two techniques in this field [8]:
optimization of query processing in Heterogeneous distributed (A) Selection ordering technique: selection ordering
databases will be analyzed. In this system an agent has been undertakes the way to exchange given set of commutative
added to make the database adaptable. In this system the filters (selections) to all the tuples of a relation. In this
greatness of the effect of number of agents on optimization of
processing of joined queries has been analyzed.
technique new techniques are presented by using Greedy
techniques and monitoring tuples properties continuously and
Keywords: join ordering, system adaptable, multi-agent adapting processes.
systems, Database. (B) Adaptive JOIN processing: the design and analysis of
adaptive techniques for join queries is more complicated than
I. INTRODUCTION selection ordering the space of execution is much larger and
more complex. This technique divides queries to:
A distributed database is a logical set of associated (B.1) Independent pipelined executions
databases which are in the form of computer network. In fact, (B.2) Dependent pipelined executions
these distributed resources have increased performance, (B.3) Non - pipelined executions
reliability, and accessibility. Distributed databases provide an And presents a solution for each one, But each one of
interface for the user which is shown exclusively as a single pointed techniques has its own problem like parametric query
system to a user and the user can access the data on the optimization, parallelism and needing to large memory
system without knowing where the data are located execution.
physically [2]. The way the system works in responding to Different methods had been offered for optimization of
queries can be effective and can decrease the cost of a join query processing [12]. In the past years one of these
database [9]. The join operator, as a query or part of a query ways is the use of dynamic programming. In this method, by
and as the means by which tables are joined, can also using dynamic programming algorithms, processing of
influence query optimization. The type of priority involved in queries is divided into two groups of optimization and
the ordering of joins can affect the join operation and can execution and then queries are optimized and responded to.
increase the degree of optimization. Finding the degree of And up to now this method has been improved (additions
optimization can be costly and as time-consuming as an np- such as more discoveries, using histograms, revision of
hard problem. [3] command, etc.) but in all cases the major architecture of the
This article has discussed join ordering. This article has system R is similar to others and unfortunately the use of this
attempted to decrease the response time to queries by making method in different and new areas is growing fast [9]; for
them adaptable. In this article the effect of the number of instance in data flows, limited resources, transactions of local
agents on optimization of query processing will be analyzed. queries, etc. [15].
II. EARLIER METHODS Using this method has created limitations for them.
However, other methods can be used for optimization of
Several techniques have been proposed to extend the query query processing in distributed databases. Three general
Optimization process to solve some of these problems [8]: methods have been proposed for optimization of join

  18
Journal of Applied Computer Science & Mathematics, no. 15 (7) /2013, Suceava

ordering which are [3]:


 Deterministic methods
 Genetic methods
 Probabilistic methods
Deterministic methods make use of dynamic programming
and have a high execution time and need a very bigger search
scope when joins are more than 15. On the other hand,
genetic and probabilistic algorithms do not provide the best
possible join but instead are able to respond to more joins in
less time. The optimum response time depends on the number
of parameters in an algorithm.
In this article a method has been proposed which optimizes Fig. 1: A Multi-agent System [3]
the join ordering method by making it adaptable for longer
terms to be used in the database. In this method the agents will not only implement the
In this article the use of multi-agent systems in algorithm but also make the system adaptivity. Here, AQA
Heterogeneous distributed databases has been discussed. will be responsible for adaptable so that after a long time the
system can use the adapted information for join ordering
III. THE PROPOSED ALGORITHM without having to perform any operations on sub-queries. In
the following paragraphs we will analyze the mechanism of
Several multi-agent systems have been devised around the AQA and it’s adaptivity.
world for query processing in distributed databases [15] (an In this algorithm by combining join and agent ordering
example is illustrated in figure 1). techniques a multi-agent system has been proposed which by
In this architecture various agents have been used and each gathering information based on processing of users' queries
system has a specific functionality. These agents and their tries to provide a customized environment for join ordering in
functionalities are as follows [3]: processing of queries sent to the database.
 Distributer Agent: This agent receives a query and To implement this algorithm we will add an agent to
divides it into a set of queries and sends each query to DBMS to make the database adaptable and by decreasing
local query optimizers. steps for processing of joined queries which are sent
 Local Optimizer Agent: This agent sends the size of repeatedly to the database we can increase query processing
tables in each sub-query to the local optimizer. in distributed databases. The proposed algorithm will be
made up of the following parts:
 Global Optimizer Agent: In order to find the best
possible join ordering, this agent uses the results of the  Join commands separator
local optimizer agent, sizes of tables [21], and genetic  Replacement policy
algorithms and finds the best or almost the best possible  Query similarity identifier
join ordering and sends it to local optimizer agents and In this algorithm queries are first studied and join queries
then, query is executed and sent back to the user. Here, are separated from others; the other queries are treated as
the algorithm used is based on genetic structure in redundant and processed normally but if the answer is “no”,
which chromosomes are defined as trees, crossover is the query is sent to similarity identifier in order to search for
single-point, in mutation operator some bits are changed the similar queries among adaptable templates. If similar
and if data are invalid operation will not performed on query is found, its query plan can be used but if no similar
the chromosome, and also in fitness function the user is match is found the query will be executed normally. In
led to the best possible ordering. In this article it has addition, we have another section which can identify
been attempted to add a new agent which can optimize adaptable queries and replace them in the system.
the process of join ordering in distributed databases and
modify this process.
Adding a query adapter agent to the system architecture
has been proposed by authors in previous works [5, 6, 7, 8].
In this article, Effect of Number of Agents in the system has
been studied. In proposed method, by adding a new agent to
the system called adapter we have tried to decrease the
workload in join ordering process and by customizing queries
and making default settings out of them we have attempted to
create adaptive templates which can be used repeatedly over
time. Figure 2 shows a schema of this template.
Fig. 2: Proposed Multi-agent System

  19
Computer Science Section

the system adaptable we will listen to the queries sent to the


Algorithm for Query Processing Optimization database, analyze them, and similar queries with high
in Databases frequency will be replaced by a query plan in the database
1. Start. and wait for next queries' response (Remember, however,
2. Analyze join queries using separator. that only those queries will be sent to this part which have
3. If it is an exception, create query plan normaly. been chosen by the separator).
4. If not, search for existing query plan using The time between the two adaptively operations will be
similarity identifier. calculated dynamically based on the value of adapted queries.
5. If it exists, select the query plan. The more the score, the more the adaptable queries are likely
6. If not, create query plan normally. to stay in the system. It is also important to note that after
7. Execute the query plan to respond to query. some time the score of adaptable queries will increase and
8. Check if it is time for query replacement. replacement time will increase as well. Also, we will make
9. Replace when it is time for replacement. those queries adaptable which are sent to the database at rush
10. Finish. hour. To do this, at the time of adaptively operation the
Fig. 3: The Proposed Algorithm in General received queries are saved and made adaptable when the
database is less busy.
In the next part we will talk about the details of these As for adaptively process, we consider a bank of queries
sections. Figure 4 shows an example of pseudocodes related sent to the database and if a query is similar to the existing
to this algorithm. one we increase the weight of the bank but if not it is added
to the bank and made adaptable. After the adaptivity time is
o JOIN COMMAND SEPARATOR over, queries with the most score are kept in the bank. The
Various queries are sent to the database whose execution way queries are kept in a database is standardized to decrease
cost a lot for the database depending on their types and the cost of query comparing.
structures. Processing of each of these queries is costly and
different optimization methods try to reduce this cost [13]. IV. EXPERIMENTS & RESULTS
But some queries are exception and they do not include join
operations. Also, some queries require less cost. In fact, in There are numerous ways of evaluating database
this algorithm we are trying to separate join less queries from performance. One of these ways is looking at the system's
adaptable ones. runtime. Runtime is the time required to execute a command
o QUERY SIMILARITY IDENTIFIER from when it is sent to the database until we get a response.
In order to make query processing adaptable in databases In part 4 a technique for making the database adaptable
we need a section that can compare queries and identify was offered. The purpose of making Heterogeneous
similar ones. For example, consider the two following queries distributed databases adaptable is optimization of query
processing. For the sake of evaluation, this method has been
SELECT * FROM tblKala INNER JOIN tblHavaleKala designed and implemented in the form of an object-oriented
ON tblKala.KalaiD=tblHavaleKala.KalaID INNER JOIN system.
tblHavale ON tblHavaleKala.HavaleID= tblHavale.HavaleID After implementation, the system will provide us with the
where kalaid=20 following outcomes:
 The cost of normal query execution
SELECT * FROM tblKala INNER JOIN tblHavaleKala  The cost of the proposed algorithm
ON tblKala.KalaiD=tblHavaleKala.KalaID INNER JOIN
 The cost of adaptable query execution as a query
tblHavale ON tblHavaleKala.HavaleID= tblHavale.HavaleID
plan
where kalaid=31
After getting these outcomes, in order to evaluate, second
and third costs has been added and compared with the first.
As you can see, these queries ask for information about
To compare the effect of number of agents in adaptable
KalaID 20 and KalaID 31 in the database but they are very
Heterogeneous distributed databases, these outcomes have
similar in structure and therefore, can be executed using the
been compared in two databases with different agents.
same query plan. This part of the system must be able to
The first test has been done on Heterogeneous distributed
identify such similarities in queries.
database with two agents. To do this test, our agent has been
It is also worth noting that all queries will be unified and
added to the system and the result has been compared with
standardized before comparing
normal database and the result is shown in the following
o REPLACEMENT POLICY
chart.
One of the important parts of this article is how, when, and
In this system the time needed for the execution of the
with what purpose to replace queries.
queries sent to the database has been compared in an
As we know, adding a new agent to the database which
adaptable database and a non-adaptable database.
always adapts queries is costly for the system. Here to make

  20
Journal of Applied Computer Science & Mathematics, no. 15 (7) /2013, Suceava

specific times when the system is not busy. This cost will be
included in further evaluations.
Figure 5 shows the required time for execution of queries
in system's everyday rush hours. By rush hour we mean a
time when many queries are sent to the database to be
executed. The reason we show this figure is that for
adaptability we use queries that are sent to the database in
busy times. Next evaluation is about the comparison of the
number of evaluation agents in a three-agent system. To do
this, the system is made adaptable using the proposed method
Fig. 4: A report of system's everyday response time for adjusted queries and the results are shown in Figures 6 and 7.
(queries sent during different days) in a two-agent system
Figure 6 shows the required time for responding to
adaptable and non-adaptable queries every day. In this chart
the total time needed for execution of adaptable queries in
database and also the time needed for execution of adaptable
queries in non-adaptable mode. It is important to note that in
this chart the cost needed for adaptability has not been
included.
Figure 7, too, shows the required time for execution of
queries in system's everyday rush hours.
The first row of table 1 shows the reduction in time for
responding to adaptable queries and also the reduction in
Fig. 5: A report of system's everyday response time in rush hours in two- time for all the queries set to the two-agent system of the
agent systems database.
TABLE 1: EVALUATION RESULTS
Reduction for
Number of Reduction for all
Row adaptable join
agents queries
queries
1 2 %29 %3
2 3 %31 %3.2

As you can see in Table 1, the system has decreased the


queries' response time for a two-agent system by 3 percent.
Also, the second row of the table shows the reduction in
Fig. 6: A report of system's everyday response time for the received runtime for join query processing in three-agent systems.
adaptable queries (queries sent in different days) for a three-agent system
V. CONCLUSION

In this article it has been attempted to provide an


optimized method for join query processing in distributed
databases. According to this method, the system is made
adaptable by long-term queries sent by users to the database.
Also, optimization was analyzed in terms of the number of
agents. Using this method has some benefits such as having a
query plan for ordering of tables and user queries, and also
the way the information bank is accessed and servers respond
to users is under control and therefore, some repetitive
Fig. 7: A report of system's everyday response time in rush hours in a three- operations in join ordering are prevented from happening.
agent system The results of the test show that the proposed algorithm is
well capable in modeling user queries and making distributed
Figure 4 shows the time needed for the execution of the
databases adaptable based on user demands.
adaptable and non-adaptable queries. In this chart the total
time needed for execution of adaptable queries in database REFERENCES
and also the time needed for execution of adaptable queries in
non-adaptable mode. It is important to note that in this chart [1] Amol Deshpande, Zachary Ives, Vijayshankar Raman “Adaptive
the cost needed for adaptability has not been included. It is Query Processing: Why, How, When, What Next?” VLDB ‘07,
September 2328, 2007, Vienna, Austria.Copyright 2007 VLDB
because the system is not fully adaptable and it will be so in Endowment, ACM 9781595936493/07/09.

  21
Computer Science Section

[2] Reza Ghaemi, Amin Milani Fard, Hamid Tabatabaee, and Mahdi [12] V. Raman, V. Markl, D. Simmen, G. Lohman “Progressive
Sadeghizadeh “Evolutionary Query Optimization for Heterogeneous Optimization in Action”, Demo, VLDB-2004.
istributed Database Systems” PWASET VOLUME 33 SEPTEMBER [13] V. Raman, B Raman, J. Hellerstein “Online Dynamic Reordering for
2008 ISSN 2070-3740 Interactive Data Processing”, VLDB-99
[3] Amol Deshpande, Zachary Ives, and Vijay shankarRaman. “Adaptive [14] S. Babu, DeWitt, J Widom “Content-Based Routing: Different Plans
query processing”. Foundations and Trends in Databases, 1(1), 2007. for Different Data”, VLDB-05
[4] J. Callan, “Distributed information retrieval”. In Advances in [15] Ron Avnur, Joseph M. Hellerstein “ Eddies: Continuously Adaptive
Information Retrieval, W. B. Croft, Ed. Kluwer Academic Query Processing” SIGMOD-2000
Publishers,2000, pp. 127–150. [16] A. Deshpande, C Guestrin “Exploiting Correlated Attributes in
[5] Elnaz zafarani , Mohammad_Reza Feizi_Derakhshi , Hasan Asil, Amir Acquisitional Query Processing”, ICDE-2005
asil “Presenting a New Method for Optimizing Join Queries [17] Antoshenkov “Query Processing in DEC Rdb: Major Issues and
Processing in Heterogeneous Distributed Databases”, WKDD2010, Future Challenges”, IEEE Data Engg. Bull-
Phuket , Thailand , 9-10 January, 2010. [18] Ming-Syan Chen, Philip S. Yu, Kun-Lung WuC. Optimization of
[6] Mohammad_Reza Feizi_Derakhshi , Hasan Asil , Amir Asil,elnaz Parallel Execution for Multi-Join Queries. IEEE Transactions on
zafarani “Practical Software Query Optimizing by Adapting Why and Knowledge and Data Engineering, vol. 6 No. 3, June 1989.
How?” Australian Journal of Basic and Applied Sciences, January,2010 [19] David J. Dewitt, Shahram Ghandeharizadeh, Donovan A. Schneider,
[7] Mohammad_Reza Feizi_Derakhshi, Hasan Asil , Amir Asil,elnaz Allan Bricker, Hui-I Hsiao, and Rick Rasmussen. The Gamma
zafarani “Optimizing Query Processing in Practical Software Database Database Machine Project. IEEE Transactions on Knowledge and Data
by Adapting” WKDD2010, Phuket , Thailand , 9-10 January, 2010. Engineering, 2(1):44--62, 1990.
[8] Mohammad_Reza Feizi_Derakhshi, Hasan Asil, Amir Asil “Proposing [20] C. K. Baru and O. Frieder. Database Operations in a CubeConnected
a New Method for Query Processing Adaption in Data Base “WCSET Multiprocessor System. IEEE Transactions on Computers, 38(6):920--
2009: World Congress on Science, Engineering and Technology Dubai, 927, June 1989
United Arab Emirates VOLUME 37, January 28-30, 2009 ISSN 2070- [21] Remzi H. Arpaci-Dusseau. Run-time adaptation in River. Transactions
3740 on Computer Systems (TOCS),21(1): 36–86, 2003.
[9] J. M. Hellerstein “Lifting the Burden of History from Adaptive Query [22] Ozalp Babaoglu, Lorenzo Alvisi, Alessandro Amoroso, Renzo Davoli,
Processing” , A Deshpande, VLDB-04 and Luigi Alberto Giachini. Paralex: an environment for parallel
[10] Amol Deshpande “An Initial Study of Overheads of Eddies”, programming in distributed systems. pages 178–187, New York, NY,
SIGMOD Record, March 2004. USA, 1992. ACM Press.
[11] T. Urhan, M. Franklin, L. Amsaleg “Cost Based Query Scrambling for
Initial Delays”, SIGMOD-1998.

Hasan Asil works in Department of Computers at Islamic Azad University, Azarshahr Branch. In 2010, he became a Ph.D. Student in
Project Management at Management University in Switzerland. His Area of Interest is Query optimization in database, Software
Development Methodologies, personality and Management in software engineering.

  22

You might also like