Professional Documents
Culture Documents
Activity 9 Advance DBM Query Execution Research 5 DARWIN G RARALIO
Activity 9 Advance DBM Query Execution Research 5 DARWIN G RARALIO
MSIT-GRADUATE SCHOOL
Republic of the Philippines
Cagayan State University
CARIG CAMPUS
Carig Sur, Tuguegarao City
Query Compilation
Three Parts:
MSIT-GRADUATE SCHOOL
Republic of the Philippines
Cagayan State University
CARIG CAMPUS
Carig Sur, Tuguegarao City
Physical-query plan operator sort-scan can be implemented many ways. One example is a
B-Tree index on sorted attribute a.
MSIT-GRADUATE SCHOOL
Republic of the Philippines
Cagayan State University
CARIG CAMPUS
Carig Sur, Tuguegarao City
Table-scan:
If R is clustered, need B disk I/Os
If R is not clustered, could be up to T dsk I/Os – as many blocks as there are tuples
Index-Scan:
If column data is contained in the index
o SELECT category_id FROM tbl WHERE category_id BETWEEN 10 AND 100;
Don’t need to access the table
Often smaller than B
Nested-Loop Joins
A relational database system uses SQL as the language for querying and maintaining
databases. To see the data of two or more tables together, we need to join the tables; the
MSIT-GRADUATE SCHOOL
Republic of the Philippines
Cagayan State University
CARIG CAMPUS
Carig Sur, Tuguegarao City
SQL is a declarative language; we just write a query by the SQL language standard and ask
for the database to fulfill the request. Now, it is the responsibility of the database to fulfill the
user’s request optimally. Fortunately, SQL Server has the Query
We need not to worry how things actually happen in the SQL Server, but it’s always good to
know what’s happening behind the curtain sometimes so that we can figure out why a query
is running slow.
However, in the Execution plan, there are many iterators for different operations, but in this
article, we will learn one iterator only, that is, the Nested Loop Join. It is a physical join type
iterator. Whenever you join a table to another table logically, the Query Optimizer can choose
one of the three physical join iterators based on some cost based decision, these are Hash
Match, Nested Loop Join and Merge Join. This article only focuses on the Nested Loop Join,
and hence let us quickly move to the joining part.
MSIT-GRADUATE SCHOOL
Republic of the Philippines
Cagayan State University
CARIG CAMPUS
Carig Sur, Tuguegarao City
Look at the above two result sets. StudentInfo table has student’s information; it has roll
number, name and address columns. The attendance table contains daily attendance of the
students; this table has the student’s roll number, present, and attendance date columns.
If you want to see a student’s name, address, present and attendance date in a new
spreadsheet, you can only use one row at a time of a table. So, how would you do that?
In all probability, some of us will start with rollnumber 1 of the studentinfo table
We shall copy the student’s name and address from Attendance table, paste it into a new
spreadsheet,
Then we shall copy all the Present and date from the attendance table for roll number 1 and
paste it in the spreadsheet.
MSIT-GRADUATE SCHOOL
Republic of the Philippines
Cagayan State University
CARIG CAMPUS
Carig Sur, Tuguegarao City
Then we will repeat the same process for roll number 2 and roll number 3 after completing it
all; your final result set will look somewhat like this:
If we try to convert what we did above in the pseudocode, then it will be like this:
For each row from StudentInfo table until end of Attendance table
Match row from table2
If StudentInfo.rollnumber = Attendance.rollnumber
Return (StudentInfo.name , StudentInfo.Address, Attendance.Present,
Attendance.AttandanceDate)
Congratulations, now you already know how Nested Loop Join works.
MSIT-GRADUATE SCHOOL
Republic of the Philippines
Cagayan State University
CARIG CAMPUS
Carig Sur, Tuguegarao City
Output buffer may be input buffer of other operation and is not counted.
Thus, algorithm requires only M = 1 buffer blocks.
I/O cost is B®
If some index is applicable for a selection, have to read only blocks that contain qualifying
tuples.
MSIT-GRADUATE SCHOOL
Republic of the Philippines
Cagayan State University
CARIG CAMPUS
Carig Sur, Tuguegarao City
Data from operand relation is read into main memory, processed, written out to disk again,
and reread from disk to complete the operation.
Phase 1: Repeatedly fill the M buffers with new tuples from R and sort them, using any
main-memory sorting algorithm. Write out each sorted sublist to secondary storage.
Phase 2 : Merge the sorted sublists. For this phase to work, there can be at most M —1 sorted
sublists, which limits the size of R. We allocate one input block to each sorted sublist and one
block to the output.
Merging
Find the smallest key
Move smallest element to first available position of output block.
If output block full-write to disk and reinitialize the same buffer in main memory to hold the
next output block.
If this block exhausted of records, read next block from the same sorted sublist into the same
buffer that was used for the block just exhausted.
If no blocks remain-stop.
MSIT-GRADUATE SCHOOL
Republic of the Philippines
Cagayan State University
CARIG CAMPUS
Carig Sur, Tuguegarao City
The essential idea behind all these hash-based algorithms are if the data is too big to store in
main memory, hash all the tuples of the argument or arguments using an appropriate hash
key.
Two copies of the same tuple t will hash to the same bucket.
We can examine one bucket at a time, perform on that bucket in isolation, an take as the anser
the union of Ri , where Ri is the portion of R that hashes to the ith bucket.
MSIT-GRADUATE SCHOOL
Republic of the Philippines
Cagayan State University
CARIG CAMPUS
Carig Sur, Tuguegarao City
Index-Based Algorithms
A clustering index has all tuples with a fixed value packed into the minimum possible
number of blocks.
Index-Based Selection
Selection on equality :
MSIT-GRADUATE SCHOOL
Republic of the Philippines
Cagayan State University
CARIG CAMPUS
Carig Sur, Tuguegarao City
When the index is a B tree. Or any other structure from which we easily can extract
the tuples of a relation in sored order, we have number of other opportunities to use the index.
Perhaps the simplest is when we want to compute R(X,Y) x S (Y,Z), and we have such an
index on Y for either R or S. We can then perform an ordinary sort-join, but we do not have
to perform the intermediate step of sorting one of the relations on Y.
As an extreme case, if we have sorting indexes on Y for both R and S, then we need
to perform only the final step of the simple sort-based join. This method is sometimes called
zig-zag join, because we jump back and forth between the indexes finding Y-values that they
share in common Notice that tuples form R with a Y-value that does not appear in S need
never be retrieved, and similarly, Tuples of S whose Y value does not appear in R need not be
retrieved.
MSIT-GRADUATE SCHOOL
Republic of the Philippines
Cagayan State University
CARIG CAMPUS
Carig Sur, Tuguegarao City
Models of Parallelism
MSIT-GRADUATE SCHOOL
Republic of the Philippines
Cagayan State University
CARIG CAMPUS
Carig Sur, Tuguegarao City
Shared nothing machines are relatively inexpensive to build but when we design
algorithms for these machines we must be aware that is is costly to send data from one
processor to another.
Typically, the cost of a message can be broken into a large fixed overhead plus a small
amount of time per byte transmitted.
Significant advantage to designing a parallel algorithm so that communications between
processors involve large amounts of data sent at once.
For instance, we might buffer several blocks of data at processor P, all bound for
processor Q.
If Q does not need the data immediately, it may be much more efficient to wait until we have
a long message at P and then send it to Q.
First we have to decide how data is best stored. It is useful to distribute our data
across as many disks as possible.
Assume there is one disk per processor. Then if there are p processors, divide any
relation R’s tuples evenly among the p processor’s disks.
Suppose we want to perform use each processor to examine the tuples of R
present on its won disk. To avoid communication among processor, we store those output
tuples t in at the same processor that has t on its disk.
Thus, the result relation is divided among the processor, just like R is.
We woul like to be divided evenly among the processors. However, a selection
could radically change the distribution of tuples in the result, compared to the distribution of
R.
Selection
Suppose the selection that is, find all the tuples of R whose value in the attribute a.
Suppose also that we have divided R according to the value of the attribute a. then all tuples
of R with a = 10 are at one of the processors, and the entre relation is at one
processor.
To avoid the problem, we need to think carefully about the policy for partitioning our stored
relations among the processors. The best we can do is to use a hash function h that involves al
the components of a tuple. Number of buckets is the number of processors. We can associate
each processor with a bucket ang give that processor the contents of its bucket.
MSIT-GRADUATE SCHOOL
Republic of the Philippines
Cagayan State University
CARIG CAMPUS
Carig Sur, Tuguegarao City
Simple Catalog
nr number of tuples in relation r
br number of blocks containing tuples in r
sr size of tuple in relation r (bytes)
fr blocking factor of r
V(A,r) number of distinct values in r for
attribute A
SC(A,r)
Selection cardinality SC is average number of
records satisfying condition on A, r(R) is
total number of records in R
SC(A,r) (r(R)/V(A,r)
e.g., SC(A,r) 1 if A is key of R and cond. is
equality
HTi number of levels in index i
MSIT-GRADUATE SCHOOL
Republic of the Philippines
Cagayan State University
CARIG CAMPUS
Carig Sur, Tuguegarao City
Using primary index If selection involves equality on key attribute with primary index, use
primary index to retrieve (at most one) record
E HTi 1
Using primary index to retrieve multiple records If selection condition involves range on key
field with primary index, use index to find record satisfying corresponding equality condition
then retrieve subsequent records
E HTi br/2 assume half of tuples satisfy condition E HTi ?c/fr? if actual value used in
comparison is known
Use a clustering index to retrieve multiple records If selection condition involves equality
comparison on non-key attribute with clustering index, use index to retrieve all records
satisfying condition
E HTi ?SC(A,r)/fr?
Using a secondary index If selection condition involves equality or inequality on key or non-
key field with secondary index (non-ordering field),use index to retrieve records
E HTi SC(A,r)
MSIT-GRADUATE SCHOOL
Republic of the Philippines
Cagayan State University
CARIG CAMPUS
Carig Sur, Tuguegarao City
References:
https://slideplayer.com/slide/13028733/
https://www.sqlshack.com/introduction-to-nested-loop-joins-in-sql-server/
https://www2.cs.sfu.ca/CourseCentral/454/bzhou/documents/s22.pdf
https://slideplayer.com/slide/16096875/
https://slideplayer.com/slide/5030433/
https://www.powershow.com/view/f21b5-OTA5M/
Basic_Algorithms_for_Executing_Query_Operations_powerpoint_ppt_presentation
MSIT-GRADUATE SCHOOL