Experiment2 E059 DWM PDF

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

LAB Manual

PART A
(PART A : TO BE REFFERED BY STUDENTS)

Experiment No.02
Aim: Design a star schema, snowflake schema and fact constellation schema for any subject of
your choice.

Prerequisite:
Fundamental Knowledge of Database Management
Fundamental Knowledge of SQL

Learning Outcomes:
Learning of Star, Snowflake & Fact Constellation(Galaxy) schema

Theory:
Dimensional modeling:
It is the name of a logical design technique often used for data warehouses. Dimensional
modeling always uses the concepts of facts, measures, and dimensions. Facts are typically (but
not always) numeric values that can be aggregated, Dimensions are groups of hierarchies and
descriptors that define the facts. For example, sales amount is a fact; timestamp, product,
register#, store#, etc. are elements of dimensions. Dimensional models are built by business
process area, e.g. store sales, inventory, claims, etc.

Fact table:
The fact table is not a typical relational database table as it is de‐normalized on purpose ‐ to
enhance query response times. The fact table typically contains records that are ready to
explore, usually with ad hoc queries. Records in the fact table are often referred to as events,
due to the time‐variant nature of a data warehouse environment.

The primary key for the fact table is a composite of all the columns except numeric values
/scores (like QUANTITY, TURNOVER, exact invoice date and time). Typical fact tables in a
global enterprise data warehouse are (usually there may be additional company or business
specific fact tables):

Sales fact table ‐ contains all details regarding sales


Orders fact table ‐ in some cases the table can be split into open orders and historical orders.
Budget fact table ‐ usually grouped by month and loaded once at the end of a year.
Forecast fact table ‐ usually grouped by month and loaded daily, weekly or monthly.
Inventory fact table ‐ report stocks, usually refreshed daily

Dimension table:
Nearly all of the information in a typical fact table is also present in one or more dimension
tables. The main purpose of maintaining Dimension Tables is to allow browsing the categories
quickly and easily.

The primary keys of each of the dimension tables are linked together to form the composite
primary key of the fact table. In a star schema design, there is only one de‐normalized table for
a given dimension.

Typical dimension tables in a data warehouse are:


Time dimension table
Customers dimension table
Products dimension table
Key account managers (KAM) dimension table
Sales office dimension table

Star schema architecture:


Star schema architecture is the simplest data warehouse design. The main feature of a star
schema is a table at the center, called the fact table and the dimension tables which allow
browsing of specific categories, summarizing, drill‐downs and specifying criteria. Despite the
fact that the star schema is the simplest data warehouse architecture, it is most commonly used
in the data warehouse implementations across the world today (about 90‐95% cases).

Snowflake Schema architecture:


Snowflake schema architecture is a more complex variation of a star schema design. The main
difference is that dimensional tables in a snowflake schema are normalized, so they have a
typical relational database design.

Snowflake schemas are generally used when a dimensional table becomes very big and when a
star schema can’t represent the complexity of a data structure. For example if a PRODUCT
dimension table contains millions of rows, the use of snowflake schemas should significantly
improve performance by moving out some data to other table (with BRANDS for instance). The
problem is that the more normalized the dimension table is, the more complicated SQL joins
must be issued to query them. This is because in order for a query to be answered, many tables
need to be joined and aggregates generated.

Fact constellation/Galaxy schema Architecture:


For each star schema or snowflake schema it is possible to construct a fact constellation
schema. This schema is more complex than star or snowflake architecture, which is because it
contains multiple fact tables. This allows dimension tables to be shared amongst many fact
tables.

In a fact constellation schema, different fact tables are explicitly assigned to the dimensions,
which are for given facts relevant. This may be useful in cases when some facts are associated
with a given dimension level and other facts with a deeper dimension level. Use of that model
should be reasonable when for example, there is a sales fact table (with details down to the
exact date and invoice header id) and a fact table with sales forecast which is calculated based
on month, client id and product id.

These dimensions allow us to answer questions such as


• In what regions of the country are pleated pants most popular? (fact table joined with the
product and ship‐to dimensions)
• What percentage of pants were bought with coupons and how has that varied from quarter to
quarter? (fact table joined with the promotion and time dimensions)
• How many pants were sold on holidays versus non‐holidays? (fact table joined with the time
dimension)

PART B
(PART B : TO BE COMPLETED BY STUDENTS)

(Students must submit the soft copy as per following segments within two hours of the practical
slot. The soft copy must be uploaded on the Blackboard or emailed to the concerned lab in charge
faculties at the end of the practical in case the there is no Black board access available)

Roll No. E059 Name: Shubham Gupta


Class : Btech Comp. E Batch : E3
Date of Experiment: 1/8/16 Date of Submission: 8/8/16
Grade : Time of Submission:
Date of Grading:

B.1 Schemas Designed by student:


(Paste your schemas completed during the 2 hours of practical in the lab here)

Star

Time Customers
Year Customer_ID
Quarter Name
Month Age
Week Ticket Issued
Day Contact
Time_ID Details
Category
Fact Table Airlines
Airlines_ID
Time_Id Name
Age
Customer_ID Ticket Issued
Contact
Airlines_ID Details
Category
Seat_Number Membership

Transaction_ID

Seat Pop_Destn

Seat_Numbe Ticket_Span Transaction


r
Max_Book Transaction_ID
Type
Price_change Method of
Cost Payment
Profit
Position Details of card
Pop_time
Reserved

SnowFlakes

Time
Year Customers
Quarter Customer_ID
Month Name
Week Age
Day Ticket Issued
Time_ID Contact
Details
Category
Fact Table Airlines
Airlines_ID
Time_Id Name
Age
Customer_ID Ticket Issued
Contact
Airlines_ID Details
Category
Seat_Number Membership
Seat_number
Transaction_ID

Pop_Destn

Ticket_Span Transaction Seat

Max_Book Transaction_ID Seat_Numbe


r
Price_change Method of
Payment Type
Profit
Details of card Cost
Pop_time
Position

Reserved

Consolation

Time
Year Customers
Quarter
Month Customer_ID
Week Name
Day Age
Time_ID Ticket Issued
Contact
Details
Fact Table Airlines

Time_Id Airlines_ID
Name
Customer_ID Age
Ticket Issued
Airlines_ID Contact
Details
Seat_Number Category
Membership
Transaction_ID Seat_number

Pop_Destn

Ticket_Span Transaction Seat

Max_Book Transaction_ID Seat_Numbe


Method of r
Price_change Payment Cost
Details of card Position
Profit Reserved

Pop_time

Seat Type Secondary


Fact Table
Seat_type
Transaction_ID
Privileges
Seat_type

Slice
The slice operation selects one particular dimension from a given cube and provides a new sub-cube.
Consider the following diagram that shows how slice works.
 Here Slice is performed for the dimension "time" using the criterion time = "Q1".

 It will form a new sub-cube by selecting one or more dimensions.

Dice
Dice selects two or more dimensions from a given cube and provides a new sub-cube. Consider the
following diagram that shows the dice operation.
The dice operation on the cube based on the following selection criteria involves three dimensions.

 (location = "Toronto" or "Vancouver")

 (time = "Q1" or "Q2")

 (item =" Mobile" or "Modem")

Pivot
The pivot operation is also known as rotation. It rotates the data axes in view in order to provide an
alternative presentation of data. Consider the following diagram that shows the pivot operation.
In this the item and location axes in 2-D slice are rotated.

2. Compare Dimensional table with Relational Table

Dimensional table Relational Table


Data is stored in multidimensional tables. Data is stored in RDBMS.
Cubes are unit of storage Tables are units of storage
Data is denormalized Data is normalized
Non-Volatile Volatile
MDX used to manipulate data SQL used to manipulate data
OLAP reports Normal reports
Few tables and fact tables Chain of tables and fact tables
3. What are the advantages of snowflake schema?

Dimension tables are normalized.

Normalized tables are easier to maintain

In snowflake schema, a dimension table will have one or more parent tables.

Relation between Dimension tables.

You might also like