Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

48 2 Database Concepts

2.14 Describe the different operations of the relational algebra. Elaborate


on the difference between the several types of joins. How can a join be
expressed in terms of other operations of the relational algebra?
2.15 What is SQL? What are the sublanguages of SQL?
2.16 What is the general structure of SQL queries? How can the semantics
of an SQL query be expressed with the relational algebra?
2.17 Discuss the differences between the relational algebra and SQL. Why is
relational algebra an operational language, whereas SQL is a declarative
language?
2.18 Explain what duplicates are in SQL and how they are handled.
2.19 Describe the general structure of SQL queries with aggregation and
sorting. State the basic aggregation operations provided by SQL.
2.20 What are subqueries in SQL? Give an example of a correlated subquery.
2.21 What are CTEs in SQL? What are they needed for?
2.22 What is the objective of physical database design? Explain some factors
that can be used to measure the performance of database applications
and the trade-offs that have to be resolved.
2.23 Explain different types of file organization. Discuss their respective
advantages and disadvantages.
2.24 What is an index? Why are indexes needed? Explain the various types
of indexes.
2.25 What is clustering? What is it used for?

2.9 Exercises

2.1 A French horse race fan wants to set up a database to analyze the
performance of the horses as well as the betting payoffs.
A racetrack is described by a name (e.g., Hippodrome de Chantilly),
a location (e.g., Chantilly, Oise, France), an owner, a manager, a date
opened, and a description. A racetrack hosts a series of horse races.
A horse race has a name (e.g., Prix Jean Prat), a category (i.e., Group
1, 2, or 3), a race type (e.g., thoroughbred flat racing), a distance (in
meters), a track type (e.g., turf right-handed), qualification conditions
(e.g., 3-year-old excluding geldings), and the first year it took place.
A meeting is held on a certain date and a racetrack and is composed
of one or several races. For a meeting, the following information is kept:
weather (e.g., sunny, stormy), temperature, wind speed (in km per hour),
and wind direction (N, S, E, W, NE, etc.).
Each race of a meeting is given a number and a departure time and
has a number of horses participating in it. The application must keep
track of the purse distribution, that is, how the amount of prize money
is distributed among the top places (e.g., first place: e228,000; second
place: e88,000, etc.), and the time of the fastest horse.
2.9 Exercises 49

Each race at a date offers several betting types (e.g., tiercé, quarté+),
each type offering zero or more betting options (e.g., in order, in any
order, and bonus for the quarté+). The payoffs are given for a betting
type and a base amount (e.g., quarté+ for e2) and specify for each option
the win amount and the number of winners.
A horse has a name, a breed (e.g., thoroughbred), a sex, a foaling date
(i.e., birth date), a gelding date (i.e., castration date for male horses, if
any), a death date (if any), a sire (i.e., father), a dam (i.e., mother), a
coat color (e.g., bay, chestnut, white), an owner, a breeder, and a trainer.
A horse that participates in a race with a jockey is assigned a number
and carries a weight according to the conditions attached to the race
or to equalize the difference in ability between the runners. Finally, the
arrival place and the margin of victory of the horses are kept by the
application.

(a) Design an ER schema for this application. If you need additional


information, you may look at the various existing French horse racing
web sites.
(b) Translate the ER schema above into the relational model. Indicate
the keys of each relation, the referential integrity constraints, and
the non-null constraints.
2.2 A Formula One fan club wants to set up a database to keep track of the
results of all the seasons since the first Formula One World championship
in 1950.
A season is held on a year, between a starting date and an ending
date, has a number of races, and is described by a summary and a set
of regulations. A race has a round number (stating the ordering of the
race in a season), an official name (e.g., 2013 Formula One Shell Belgian
Grand Prix), a race date, a race time (expressed in both local and UTC
time), a description of the weather when the race took place, the pole
position (consisting of driver name and time realized), and the fastest
lap (consisting of driver name, time, and lap number).
Each race of a season belongs to a Grand Prix (e.g., Belgian Grand
Prix), for which the following information is kept: active years (e.g.,
1950–1956, 1958, etc. for the Belgian Grand Prix), total number of
races (58 races as of 2013 for the Belgian Grand Prix), and a short
historical description. The race of a season is held on a circuit, described
by its name (e.g., Circuit de Spa-Francorchamps), location (e.g., Spa,
Belgium), type (such as race, road, street), number of laps, circuit
length, race distance (the latter two expressed in kilometers), and lap
record (consisting of time, driver, and year). Notice that over the years,
the course of the circuits may be modified several times. For example,
the Spa-Francorchamps circuit was shortened from 14 to 7 km in 1979.
Further, a Grand Prix may use several circuits over the years. For
50 2 Database Concepts

example, the Belgian Grand Prix has been held alternatively in the Spa-
Francorchamps, Zolder, and Nivelles circuits.
A team has a name (e.g., Scuderia Ferrari), one or two bases (e.g.,
Maranello, Italy), and one or two current principals (e.g., Stefano
Domenicali). In addition, a team keeps track of its debut (the first
Grand Prix entered), the number of races competed, the number of
world championships won by constructor and by driver, the highest race
finish (consisting of place and number of times), the number of race
victories, the number of pole positions, and the number of fastest laps.
A team competing in a season has a full name, which typically includes
its current sponsor (e.g., Scuderia Ferrari Marlboro from 1997 to 2011),
a chassis (e.g., F138), an engine (e.g., Ferrari 056), and a tyre brand
(e.g., Bridgestone).
For each driver, the following information is kept: name, nationality,
birth date and birth place, number of races entered, number champi-
onships won, number of wins, number of podiums, total points in the
career, number of pole positions, number of fastest laps, highest race
finish (consisting of place and number of times), and highest grid position
(consisting of place and number of times). Drivers are hired by teams
competing in a season as either main drivers or test drivers. Each team
has two main drivers and usually two test drivers, but the number of
test drivers may vary from none to six. In addition, although a main
driver is usually associated with a team for the whole season, it may
only participate in some of the races of the season. A team participating
in a season is assigned two consecutive numbers for its main drivers,
where the number 1 is assigned to the team that won the constructor’s
world title the previous season. Further, the number 13 is usually not
given to a car, it only appeared once in the Mexican Grand Prix in 1963.
A driver participating in a Grand Prix must participate in a qualifying
session, which determines the starting order for the race. The results kept
for a driver participating in the qualifying session are the position and
the time realized for the three parts (called Q1, Q2, and Q3). Finally,
the results kept for a driver participating in a race are the following:
position (may be optional), number of laps, time, the reason why the
driver retired or was disqualified (both may be optional), and the number
of points (scored only for the top eight finishers).
(a) Design an ER schema for this application. In particular, state the
identifiers and the derived attributes. Note any unspecified require-
ments and integrity constraints, and make appropriate assumptions
to make the specification complete. If you need additional informa-
tion, you may look at the various existing Formula One web sites.
(b) Translate the ER schema above into the relational model. Indicate
the keys of each relation, the referential integrity constraints, and
the non-null constraints.
4.8 Exercises 117

4.4 Design a MultiDim schema for the university application given in Ex. 3.3
taking into account the different granularities of the time dimension.
4.5 Design a MultiDim schema for the French horse race application given
in Ex. 2.1. With respect to the races, the application must be able to
display different statistics about the prizes won by owners, by trainers,
by jockeys, by breeders, by horses, by sires (i.e., fathers), and by
damsires (i.e., maternal grandfathers). With respect to the bettings,
the application must be able to display different statistics about the
payoffs by type, by race, by racetrack, and by horses.
4.6 In each of the dimensions of the multidimensional schema of Ex. 4.5,
identify the hierarchies (if any) and determine its type.
4.7 Design a MultiDim schema for the Formula One application given in
Ex. 2.2. With respect to the races, the application must be able to
display different statistics about the prizes won by drivers, by teams, by
circuit, by Grand Prix, and by season.
4.8 Consider a time dimension composed of two alternative hierarchies: (a)
day, month, quarter, and year and (b) day, month, bimonth, and year.
Design the conceptual schema of this dimension and show examples of
instances.
4.9 Consider the well-known Foodmart cube whose schema is given in
Fig. 4.23. Write using the OLAP operations the following queries2 :
(a) All measures for stores.
(b) All measures for stores in the states of California and Washington
summarized at the state level.
(c) All measures for stores in the states of California and Washington
summarized at the city level.
(d) All measures, including the derived ones, for stores in the state of
California summarized at the state and the city levels.
(e) Sales average in 1997 by store state and store type.
(f) Sales profit by store and semester in 1997.
(g) Sales profit percentage by quarter and semester in 1997.
(h) Sales profit by store for the first quarter of each year.
(i) Unit sales by city and percentage of the unit sales of the city with
respect to its state.
(j) Unit sales by city and percentage of the unit sales of the city with
respect to its country.
(k) For promotions other than “No Promotion,” unit sales and percent-
age of the unit sales of the promotion with respect to all promotions.
(l) Unit sales by promotion, year, and quarter.

2
The queries of this exercise are based on a document written by Carl Nolan entitled
“Introduction to Multidimensional Expressions (MDX).”
5.14 Exercises 175

5.7 Translate the MultiDim schema obtained for the French horse race
application in Ex. 4.5 into the relational model.
5.8 Translate the MultiDim schema obtained for the Formula One applica-
tion in Ex. 4.7 into the relational model.
5.9 The Research and Innovative Technology Administration (RITA)1
coordinates the US Department of Transportation’s (DOT) research
programs. It collects several statistics about many kinds of transporta-
tion means, including the information about flight segments between
airports summarized by month.2
There is a set of tables T T100I Segment All Carrier XXXX, one by year,
ranging from 1990 up until now. These tables include information about
the scheduled and actually departured flights, the number of seats sold,
the freight transported, and the distance traveled, among other ones.
The schema and description of these tables is given in Table 5.1. A set
of lookup tables given in Table 5.2 include information about airports,
carriers, and time. The schemas of these lookup tables are composed of
just two columns called Code and Description. The mentioned web site
describes all tables in detail.
From the information above, construct an appropriate data warehouse
schema. Analyze the input data and motivate the choice of your schema.
5.10 Implement in Analysis Services the MultiDim schema obtained for
the French horse race application in Ex. 4.5 and the relational data
warehouse obtained in Ex. 5.7.
5.11 Implement in Mondrian the MultiDim schema obtained for the Formula
One application in Ex. 4.7 and the relational data warehouse obtained
in Ex. 5.8.

1
http://www.transtats.bts.gov/
2
http://www.transtats.bts.gov/DL SelectFields.asp?Table ID=261

You might also like