Professional Documents
Culture Documents
Midterm 13w2
Midterm 13w2
Name Student No
(PRINT) (Last) (First)
Signature
Total 46
Questions 1-2 are multiple choice {i.e., 4 marks each—you get 3 if you only get 1 wrong, 2 if you get 2
wrong, and 0 otherwise}. Circle ALL correct answers. There may be as few as 0 correct answers to a
given question, and as many as 5 correct answers—so be sure to read all parts of each question. (In case
of ambiguity, please write down any reasonable assumptions that you make, if any.)
1. {4 marks} Which of the following statements about indexes and/or joins are true?
a) If you are joining two tables A and B, on their productID field (i.e., this same field name
exists in both tables), and a hash index exists on B.productID, but there is no index at all on
table A, then you cannot use sort-merge join (SMJ) to join the tables.
b) It is possible for a composite hash index to be a clustering index (i.e., the data in the table is
appropriately clustered). Note: “Composite” means that multiple fields make up a single
“key”—i.e., they are concatenated.
c) There is usually no difference in the number of page I/Os for BNL, if we switch the order of
the outer table and the inner table for the join.
d) If we try to fill the index pages and the data pages to capacity (i.e., we aim for close to 100%
full), then it is possible for the number of pages in an index to exceed the number of pages in the
data table.
e) A primary key can be composite key in an unclustered B+ tree index.
Page 2
3. {8 marks} This question is about external mergesort. Suppose we have 51,500 4K pages of
unsorted data on contiguous cylinders. Suppose further that there are 500 pages per cylinder on the one-
and-only disk drive, and you have the disk drive all to yourself (i.e., there is no contention from other
processes).
Like some of the online practice questions, let us only consider two types of seek operations: a long
seek (LS) and a short seek (SS). A short seek is a seek that is to an adjacent cylinder ONLY. In other
words, if we’re going directly from cylinder 8220 to cylinder 8218, then that’s not a short seek. Don’t
worry about computing any I/O times in milliseconds—for part (b), we’re only interested in the number
of long seeks and the number of short seeks. But, justify your answer by showing your work!
a) How many sorted runs will there be after the sort phase ONLY (i.e., the first phase), and what are the
sizes of the sorted runs?
b) Compute the number of long seeks (LS) as well as the number of short seeks (SS) created during
Phase 1 (the sort phase) of the external mergesort ONLY. Do not do anything for the merge phase(s).
Page 3
USE THE INFORMATION ON THIS PAGE FOR THE REST OF THE QUESTIONS IN THIS EXAM.
Consider the following relations, similar to the ones you’ve seen in class, but containing a bit of extra
information (you don’t need to know what most of the extra fields are):
For the indexes indicated below, assume 4 bytes for numeric fields, 8 bytes for strings, and 10 bytes for
rids/pointers. You can use up to B=15 buffer pages, unless told otherwise.
Assume the existence of the following indexes for this question. You won’t need them all!
Unless told otherwise, assume that it costs 1.2 I/Os to probe a hash index (i.e., to find the first bucket
containing a matching key), and 3 I/Os to probe a B+ tree index (i.e., to find the first leaf page
containing a matching key).
You can assume a uniform distribution for values in the tables; however, primary keys like B.bid,
S.sid, and R.bid+sid+rdate are, of course, unique.
Page 4
4. {4 marks} Compute the number of pages at each level of the unclustered B+ tree index for R.bid.
Show your work. Assume that each page is filled to capacity, where possible.
5) {4 marks} Suppose the Boats table included a field to indicate the age of the boat. Assuming
uniform distributions, if the optimizer knew that there were only 10 boat colors, and the ages of boats
ranged between 0 and 49 years (integer values), then how many boat tuples would qualify for the result
of the following query? Justify your answer by showing your work.
SELECT *
FROM Boats B
WHERE (B.color = “red” AND B.age > 24)
OR
(B.age < 10);
Page 5
6. {6 marks} Estimate the cost of computing the following SQL query using the query tree/plan shown
below, making reasonable decisions along the way. Assume that only 1 in 200 people who are making
the reservations are named Steven. Justify your answer by showing your work.
SELECT *
FROM Reserves R, Boats B
WHERE R.rname = “Steven”
AND
R.bid = B.bid;
►◄
INJ on bid
(pipeline)
Reserves
7) {4 marks} (a) Suppose we have the following SQL query. Estimate (compute) the number of page
I/Os, assuming we make a good choice of query plan. You do not have to draw the query tree. Justify
your answer by showing your work.
SELECT count(*)
FROM Reserves R
WHERE R.bid = 500;
(b) If we changed “SELECT count(*)” to “SELECT *”, then would the number of page I/Os:
a) stay the same?
b) increase?
c) decrease?
Circle one of the above letters. There is no need to explain.
Page 6
8) {5 marks} Estimate the number of page I/Os to do a Sort-Merge Join (SMJ) on the Sailors and
Reserves tables. Note that they’ll join on the sid key. Assume that there are B = 80 buffer pages.
Justify your answer by showing your work.
9) {3 marks} Estimate the number of page I/Os to do an appropriate Block Nested Loop (BNL) join on
the Boats and Reserves tables. Note that they’ll join on the bid field. Assume that there are only B =
12 buffer pages. Show your work.
Page 7
10) {4 marks} Estimate the number of page I/Os to do a Hash Join (HJ) on the Boats and Reserves
tables. Assume B = 25 buffer pages. Furthermore, assume that there is a 20% “fudge factor” required
for the size of the hash table. (In other words, just for the sake of an example, if you planned to build a
10-page hash table without a fudge factor, it will now take 20% more space, or 12 pages in all, to build
the hash table.) Justify your answer by showing your work.
Page 8