Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

The University of British Columbia

Computer Science 404: Database Systems


Midterm #2, March 20, 2013. Instructor: E.M. Knorr

Time: 48 minutes. Only a simple, non-programmable, non-communicating calculator is permitted.


Closed book. No help sheets. No cell phones, no smartphones, etc.

Name Student No
(PRINT) (Last) (First)

Signature

The examination has 8 pages, but that includes


this cover sheet. Check that you have a complete paper. Marks
Print your name and ID at the top of this page, and Question Max. Achieved
provide your signature. Have your student ID ready.
1 4
You do not need to print your name on any pages other
than the cover page. 2 4

A simple, non-programmable, non-communicating 3 8


calculator is permitted. No other aids are allowed.
4 4
Work quickly and do the easy questions first. Part marks
5 4
are available.
6 6
The marks for each question are given in braces. Do not
spend too much time on any one question. 7 4
To minimize disruptions during the exam, please avoid 8 5
asking the invigilators for help, tips, or explanations.
Please write down any reasonable assumptions that you 9 3
are making, if you believe that a question is ambiguous.
10 4

Total 46
Questions 1-2 are multiple choice {i.e., 4 marks each—you get 3 if you only get 1 wrong, 2 if you get 2
wrong, and 0 otherwise}. Circle ALL correct answers. There may be as few as 0 correct answers to a
given question, and as many as 5 correct answers—so be sure to read all parts of each question. (In case
of ambiguity, please write down any reasonable assumptions that you make, if any.)

1. {4 marks} Which of the following statements about indexes and/or joins are true?
a) If you are joining two tables A and B, on their productID field (i.e., this same field name
exists in both tables), and a hash index exists on B.productID, but there is no index at all on
table A, then you cannot use sort-merge join (SMJ) to join the tables.
b) It is possible for a composite hash index to be a clustering index (i.e., the data in the table is
appropriately clustered). Note: “Composite” means that multiple fields make up a single
“key”—i.e., they are concatenated.
c) There is usually no difference in the number of page I/Os for BNL, if we switch the order of
the outer table and the inner table for the join.
d) If we try to fill the index pages and the data pages to capacity (i.e., we aim for close to 100%
full), then it is possible for the number of pages in an index to exceed the number of pages in the
data table.
e) A primary key can be composite key in an unclustered B+ tree index.

2. {4 marks} Which of the following statements are true?


a) For the sort phase (the first phase) of external mergesort, using full tracks for the input
buffers instead of using half cylinders is likely to speed up a large mergesort, if there are many
tracks per cylinder.
b) Hashing can be useful when trying to find duplicate keys in a large unsorted file.
c) External mergesort can be useful when trying to find duplicate keys in a large unsorted file.
d) A reduction factor value that’s very close to 1.00 (e.g., RF = 0.99) is useful in significantly
reducing the number of tuples selected from a table.
e) When looking up large ranges of search keys, it is usually better to use a B+ tree index than a
hash index.

Page 2
3. {8 marks} This question is about external mergesort. Suppose we have 51,500 4K pages of
unsorted data on contiguous cylinders. Suppose further that there are 500 pages per cylinder on the one-
and-only disk drive, and you have the disk drive all to yourself (i.e., there is no contention from other
processes).

We will use B=1000 buffer pages of size 4K for this question.

Like some of the online practice questions, let us only consider two types of seek operations: a long
seek (LS) and a short seek (SS). A short seek is a seek that is to an adjacent cylinder ONLY. In other
words, if we’re going directly from cylinder 8220 to cylinder 8218, then that’s not a short seek. Don’t
worry about computing any I/O times in milliseconds—for part (b), we’re only interested in the number
of long seeks and the number of short seeks. But, justify your answer by showing your work!

a) How many sorted runs will there be after the sort phase ONLY (i.e., the first phase), and what are the
sizes of the sorted runs?

b) Compute the number of long seeks (LS) as well as the number of short seeks (SS) created during
Phase 1 (the sort phase) of the external mergesort ONLY. Do not do anything for the merge phase(s).

Page 3
USE THE INFORMATION ON THIS PAGE FOR THE REST OF THE QUESTIONS IN THIS EXAM.

Consider the following relations, similar to the ones you’ve seen in class, but containing a bit of extra
information (you don’t need to know what most of the extra fields are):

Boats (B) = 15 pages @ 20 tuples per (4K) page


Reserves (R) = 25,000 pages @ 100 tuples per page
Sailors (S) = 100 pages @ 30 tuples per page

For the indexes indicated below, assume 4 bytes for numeric fields, 8 bytes for strings, and 10 bytes for
rids/pointers. You can use up to B=15 buffer pages, unless told otherwise.

Assume the existence of the following indexes for this question. You won’t need them all!

B – clustered B+ tree on bcolor


B – unclustered B+ tree on bname
B – hash index on bid

R – hash index on sid + bid + rdate (composite index)


R – unclustered B+ tree on bid
R – clustered B+ tree on rdate

S – clustered B+ tree on age


S – unclustered B+ tree on sname
S – unclustered B+ tree on rating + age (composite)
S – hash index on sid

Unless told otherwise, assume that it costs 1.2 I/Os to probe a hash index (i.e., to find the first bucket
containing a matching key), and 3 I/Os to probe a B+ tree index (i.e., to find the first leaf page
containing a matching key).

You can assume a uniform distribution for values in the tables; however, primary keys like B.bid,
S.sid, and R.bid+sid+rdate are, of course, unique.

Page 4
4. {4 marks} Compute the number of pages at each level of the unclustered B+ tree index for R.bid.
Show your work. Assume that each page is filled to capacity, where possible.

5) {4 marks} Suppose the Boats table included a field to indicate the age of the boat. Assuming
uniform distributions, if the optimizer knew that there were only 10 boat colors, and the ages of boats
ranged between 0 and 49 years (integer values), then how many boat tuples would qualify for the result
of the following query? Justify your answer by showing your work.

SELECT *
FROM Boats B
WHERE (B.color = “red” AND B.age > 24)
OR
(B.age < 10);

Page 5
6. {6 marks} Estimate the cost of computing the following SQL query using the query tree/plan shown
below, making reasonable decisions along the way. Assume that only 1 in 200 people who are making
the reservations are named Steven. Justify your answer by showing your work.

SELECT *
FROM Reserves R, Boats B
WHERE R.rname = “Steven”
AND
R.bid = B.bid;

►◄
INJ on bid

(pipeline)

σ rname = “Steven” Boats

Reserves

7) {4 marks} (a) Suppose we have the following SQL query. Estimate (compute) the number of page
I/Os, assuming we make a good choice of query plan. You do not have to draw the query tree. Justify
your answer by showing your work.

SELECT count(*)
FROM Reserves R
WHERE R.bid = 500;

(b) If we changed “SELECT count(*)” to “SELECT *”, then would the number of page I/Os:
a) stay the same?
b) increase?
c) decrease?
Circle one of the above letters. There is no need to explain.

Page 6
8) {5 marks} Estimate the number of page I/Os to do a Sort-Merge Join (SMJ) on the Sailors and
Reserves tables. Note that they’ll join on the sid key. Assume that there are B = 80 buffer pages.
Justify your answer by showing your work.

9) {3 marks} Estimate the number of page I/Os to do an appropriate Block Nested Loop (BNL) join on
the Boats and Reserves tables. Note that they’ll join on the bid field. Assume that there are only B =
12 buffer pages. Show your work.

Page 7
10) {4 marks} Estimate the number of page I/Os to do a Hash Join (HJ) on the Boats and Reserves
tables. Assume B = 25 buffer pages. Furthermore, assume that there is a 20% “fudge factor” required
for the size of the hash table. (In other words, just for the sake of an example, if you planned to build a
10-page hash table without a fudge factor, it will now take 20% more space, or 12 pages in all, to build
the hash table.) Justify your answer by showing your work.

Page 8

You might also like