Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

TUTORIAL 10

Week 11 Wang Lipo elpwang@ntu.edu.sg 1


1. In a college, the following are four dimension tables in a star schema used to
analyze grades achieved by students:
• CourseSection dimension table: CourseSectionID, CourseID, CourseName,
Units, RoomID, and RoomCapacity. During a given semester, the college offers
an average of 500 course sections.
• Professor dimension table: ProfessorID, ProfName, Title, DepartmentID, and
DepartmentName. There are typically 200 professors at the college at any given
time. On the average, 2 professors teach one course section. A professor teaches
about 5 course sections per period.
• Student dimension table: StudentID, StudentName, and Major. Each course
section has an average of 40 students, and students typically take 5 courses per
period. There are about 4000 students.
• Period dimension table: SemesterID, and Year. The data warehouse contains
data for 30 periods (i.e., 3 semesters per year, a total of 10 years).
The only fact table is CourseGrade, which stores information on which student took
which courses taught by which professors on which semester, with what grades.
Do the following:
a) Design a basic star schema for storing the above-mentioned data in a data
warehouse.
b) Estimate the number of rows in the fact table, using the assumptions stated
previously.
c) Estimate the total size of the fact table (in bytes), assuming that each field has an
average of 5 bytes.
Week 11 Wang Lipo elpwang@ntu.edu.sg 2
CourseSection dimension table: CourseSectionID, CourseID, CourseName, Units, RoomID, and RoomCapacity. During a
given semester, the college offers an average of 500 course sections.
Professor dimension table: ProfessorID, ProfName, Title, DepartmentID, and DepartmentName. There are typically 200
professors at the college at any given time. On the average, 2 professors teach one course section. A professor teaches about 5
course sections per period.
Student dimension table: StudentID, StudentName, and Major. Each course section has an average of 40 students, and
students typically take 5 courses per period. There are about 4000 students.
Period dimension table: SemesterID, and Year. The data warehouse contains data for 30 periods (i.e., 3 semesters per year, a
total of 10 years).
The only fact table is CourseGrade, which stores information on which student took which courses taught by which professors
on which semester, with what grades.
a) Design a basic star schema for storing the above-mentioned data in a data warehouse.

Week 11 Wang Lipo elpwang@ntu.edu.sg 3


CourseSection dimension table: CourseSectionID, CourseID, CourseName, Units, RoomID, and
RoomCapacity. During a given semester, the college offers an average of 500 course sections.
Professor dimension table: ProfessorID, ProfName, Title, DepartmentID, and DepartmentName. There
are typically 200 professors at the college at any given time. On the average, 2 professors teach one
course section. A professor teaches about 5 course sections per period.
Student dimension table: StudentID, StudentName, and Major. Each course section has an average of 40
students, and students typically take 5 courses per period. There are about 4000 students.
Period dimension table: SemesterID, and Year. The data warehouse contains data for 30 periods (i.e., 3
semesters per year, a total of 10 years).
The only fact table is CourseGrade, which stores information on which student took which courses
taught by which professors on which semester, with what grades.
b) Estimate the number of rows in the fact table, using the assumptions stated previously.
c) Estimate the total size of the fact table (in bytes), assuming that each field has an average of 5
bytes.
b)
Considering Course Sections: 500 course sections per period x 2 professors per section x 40
students per section x 30 periods = 1, 200,000 rows in the fact table.
Alternatively, considering students: 4000 students x 5 course sections per student per period x 2
professor per course section x 30 periods = 1, 200,000 rows in the fact table.
Alternatively, considering professor, 200 professor x 5 course sections per professor per period x 40
students per course section x 30 period = 1, 200,000 rows in the fact table.
c)
1, 200,000 rows * 5 fields per row * 5 bytes per field = 30,000,000 bytes for the fact table.

Week 11 Wang Lipo elpwang@ntu.edu.sg 4


2. In addition to the information given in Question 1, we further assume
that:
• RoomID uniquely identifies RoomCapacity.
• CourseID uniquely identifies CourseName and Units. Each course
may be assigned to multiple rooms and each room may be assigned
with multiple courses.
• DepartmentID uniquely identifies DepartmentName. Each professor
works only in one department.
Discuss the advantages and disadvantages in using normalized
dimension tables in star schemas for data warehouses. Assuming we
don’t frequently access these information together, normalize the
dimension tables in the star schema designed in Question 1.

Week 11 Wang Lipo elpwang@ntu.edu.sg 5


2. Discuss the advantages and disadvantages in using normalized dimension
tables in star schemas for data warehouses.

Advantages: Using normalized dimension tables will save storage (as


minimum data will be duplicated) and eliminate data inconsistency when
duplicated data are updated.

Disadvantages: Using normalized dimension tables may affect the


performance (i.e., access speed) of some queries due to having to join
tables during queries.

Week 11 Wang Lipo elpwang@ntu.edu.sg 6


2. Assuming we don’t frequently access these information together, normalize the dimension tables
in the star schema designed in Question 1.
• RoomID uniquely identifies RoomCapacity.
• CourseID uniquely identifies CourseName and Units. Each course may be assigned to multiple rooms
and each room may be assigned with multiple courses.
• DepartmentID uniquely identifies DepartmentName. Each professor works only in one department.

7
Week 11 Wang Lipo elpwang@ntu.edu.sg 7
3. Assuming that the college stated in Question 2 now wants to include the following new data
about course sections:
• The department offering the course (each course is offered by only one department)
• The school to which the department reports (each department reports to only one school)
Change the star schema designed in Question 2 to cater for the new data.

8
Week 11 Wang Lipo elpwang@ntu.edu.sg 8
4. A manufacturing company needs a data warehouse to store
data for each fiscal period and summarize facts about the
following types of goods movement:
a) Transfer goods internally, i.e., between plants, and from
plants to storages;
b) Orders by customers from storages;
c) Returns of goods from customers to storages;
d) Purchases from vendors to plants
The company needs to treat customers, vendors, plants, and
storages as distinct dimensions that may be involved at either
end or both ends of a movement event, i.e., destination and/or
origin. For each type of destination or origin, the company
wants to know the name, city, and state. Facts about each
movement include dollar and volume moved, cost of movement,
and revenue collected from the move (if any, and this can be
negative for a return). Design a star schema to represent this
data warehouse directly (without generalization). Simplify the
resulting star schema through generalization.

Week 11 Wang Lipo elpwang@ntu.edu.sg 9


4. A manufacturing company needs a data
warehouse to store data for each fiscal
period and summarize facts about the
following types of goods movement:
a) Transfer goods internally, i.e.,
between plants, and from plants to
storages;
b) Orders by customers from storages;
c) Returns of goods from customers to
storages;
d) Purchases from vendors to plants
The company needs to treat customers,
vendors, plants, and storages as distinct
dimensions that may be involved at either
end or both ends of a movement event, i.e.,
destination and/or origin. For each type of
destination or origin, the company wants
to know the name, city, and state. Facts
about each movement include dollar and
volume moved, cost of movement, and
revenue collected from the move (if any,
and this can be negative for a return).
Design a star schema to represent this
data warehouse directly (without
generalization).
10
Week 11 Wang Lipo elpwang@ntu.edu.sg 10
4. Simplify the resulting star schema through generalization.

We could simplify this star schema substantially by generalizing the dimension


tables and the fact tables.
The dimension tables Customer, Plant, Storage, and Vendor can be generalized
to Location table, in which ObjectType can be Customer, Plant, Storage, or
Vendor.

The generalized fact table describes movements from one location to another,
and contains an origin key and a destination key. For example, if a customer
orders some items, the origin key would be a storage ID and the destination key
would be the CustomerID. TxnType (transaction type) will label the type of
transaction that occurred (i.e., order, return, transfer, purchase).

Week 11 Wang Lipo elpwang@ntu.edu.sg 11


5. An international pharmaceutical company operates a network of 300
chain drug stores all over the world. The company is setting up a drug
data warehouse to store daily information for a period of 10 years for
drug sales analysis. The total sales of drugs (Total_Cost_Value and
Total_Sales_Value) per day for each drug and for each store should be
kept in the data warehouse. There is an average of 50 different drugs
sold by each store per day. There are a total of 500 different drugs. On
average, each drug is sold 30 times per day. Data for the data warehouse
are extracted from the company database. There are three relevant
tables in the database:
DRUG (Drug_ID, Drug_Name, Unit_Price, Unit_Cost)
SALES (Drug_ID, Store_ID, Sales_Date, Qty_Sold)
STORE (Store_ID, Address, Country)
Do the following:
a) Design and draw a star schema to represent the data warehouse
accurately for the company.
b) Estimate the number of rows in the fact table in part 5(a).

Week 11 Wang Lipo elpwang@ntu.edu.sg 12


5. The total sales of drugs (Total_Cost_Value and Total_Sales_Value) per day for
each drug and for each store should be kept in the data warehouse. There are three
relevant tables in the database:
DRUG (Drug_ID, Drug_Name, Unit_Price, Unit_Cost)
SALES (Drug_ID, Store_ID, Sales_Date, Qty_Sold)
STORE (Store_ID, Address, Country)
a) Design and draw a star schema to represent the data warehouse accurately for
the company.

13
Week 11 Wang Lipo elpwang@ntu.edu.sg 13
5. An international pharmaceutical company operates a network of 300 chain drug
stores all over the world. The company is setting up a drug data warehouse to store
daily information for a period of 10 years for drug sales analysis. The total sales of
drugs (Total_Cost_Value and Total_Sales_Value) per day for each drug and for each
store should be kept in the data warehouse. There is an average of 50 different drugs
sold by each store per day. There are a total of 500 different drugs. On average, each
drug is sold 30 times per day. Data for the data warehouse are extracted from the
company database. There are three relevant tables in the database:
DRUG (Drug_ID, Drug_Name, Unit_Price, Unit_Cost)
SALES (Drug_ID, Store_ID, Sales_Date, Qty_Sold)
STORE (Store_ID, Address, Country)
b) Estimate the number of rows in the fact table in part 5(a).

Considering the stores, the estimated no of rows in the fact table = 300 stores *
50 drugs per store per day * 365 days per year * 10 years = 54,750,000 rows.

Alternatively, considering the drugs, the estimated no of rows in the fact table =
500 drugs * 30 times per day * 365 days per year * 10 years = 54,750,000 rows.

Week 11 Wang Lipo elpwang@ntu.edu.sg 14

You might also like