Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

Business analytics

Data Modeling Lecture 2


Key concepts
▪Data modeling
▪Multidimensional modeling
▪Star schema , snowflake schema
Data Modeling Basics
▪Entity: anything real or abstract for
which data is to be stored. It can be
roles, events, locations, tangible
things, concepts
▪Attributes: characteristic property of
an entity. Entity can have multiple
attributes
▪Cardinality of relationship: one to one,
one to many-person to chair ( one to
One), project to person( One to Many)
Types of data model
▪Conceptual data model
▪Identifies most important entities
▪Identifies relationships between
different entities
▪Does not support the specification of
attributes
▪Does not support the specification of
the primary key (208)
Types of data model
▪ Logical data model
▪Identifies all entities and relationships
▪Identifies all the attributes for each
entity
▪Specifies the primary key for each
entity
▪Specifies the foreign keys
▪Normalization of entities
performed(214)
First Normal form rules
▪The table must be two dimensional
▪Each row contains data about one thing
or one part of the thing
▪Each column contains data for a single
attribute
▪All entries in the column must of the
same type
▪Each column must have a unique name
▪No two rows may be identical
▪Example Name , Date of Birth, address
Second Normal form rules
▪ 2NF = 1NF + each column depends
on the ENTIRE primary key.
▪Primary ley can be made of one or
more attributes.
▪ Example : Name , DOB and Hobbies
▪Split into two tables of 1NF Name and
DOB and another with Name+ DOB
and Hobby .
▪https://www.youtube.com/watch?v=K7
vzLrGCV50&list=PLQ9AAKW8HuJ5m
0rmHKL88ZyjOIKejvrj0
Types of data model
▪ Logical data model
▪ 1NF: relationship should have
atomic (simple and visible)
attributes.
▪2NF: primary key contains multiple
attributes (composite primary key)
▪3NF :relationship should not have a
non-key attribute
Normalization vs denormalization
▪ The difference
between normalization and denormalizat
ion is simple.
▪When data is normalized it exists in one
and only one source-of-truth location.
▪Denormalized data exists in multiple
summarized locations. Data living in one
or many locations has important
consequences for accuracy and speed.
Normalization vs denormalization
▪ Normalization is the process of
reorganizing data in a database so
that it meets two basic requirements:
▪ (1) There is no redundancy of data
(all data is stored in only one place),
and
▪(2) data dependencies are logical (all
related data items are stored
together).
Types of data model
▪Physical model: a representation of how
the model will be built in the database. It
will exhibit all table structures including
column names, column data types,
column constraints, primary key, foreign
key and relationship between tables
▪Convert entities into tables/relations
▪Convert relationships into foreign keys
▪Convert attributes into columns/fields
(218)
Data Modelling techniques
▪ Entity relationship modeling
▪Identify all entities
▪Identify the relationship among
entities along with cardinality
▪Identify key attributes
▪Identify all the relevant attributes
▪Plot the ER diagram and review with
business users. (222)
14
Data Modelling techniques
▪Dimensional modelling: to understand
multidimensional view of the data
(same data, multiple perspectives).
▪It is a logical design technique for
structuring data for delivering fast
query performance.
▪Dimensional database can be viewed
as a cube of three or more
dimensions for analyzing the data.
Data modelling techniques
▪Dimensional modelling divides database
into (1) Measurement and (2) Context.
▪Measurements are usually numeric
values called FACTS
▪Facts are related to various contexts.
▪ Contexts are divided into independent
logical clumps called DIMENSIONS
▪Dimensions describe the “who, what,
when, where, why and how” context of
measurements
Data modelling techniques
▪ Fact table:
▪Consists of various measurements.
▪Stores measures of business process
and points to the lowest detail level of
each dimension table
▪Measures are factual/quantitative and
are generally numeric
▪Additive facts ( revenue), non-additive
facts(temp),semi-additive facts (
inventory)
Data modelling techniques
▪ Dimension table
▪Consists of dimensional attributes (
descriptive) which describe the
dimension elements to enhance
comprehension
▪Typically static values containing
textual data or discrete numbers
▪Used for query filtering/constraining
Data modelling techniques
▪ Dimension table attribute must be
▪Verbose – consist of full words
▪Descriptive- convey the purpose in few
words
▪Complete- must not contain missing
values
▪Discrete values- only one value per
row
▪Quality assured- must not contain
misspelt values or impossible values
Data modelling techniques
▪Star schema
▪Consist of a large central – fact table
with no redundancy
▪Fact table will is referred by a
number of dimension tables. (Aka
look up or reference tables) (239)
Data modelling techniques
▪ Snowflake schema:
▪When dimensions of a star schema are
detailed and highly structured having
several levels of relationship.
▪Results from further expansion and
normalization of the dimension table
▪The child tables may have multiple
parent tables.
▪Snowflake in effect affects only
dimension tables and does not affect
the fact table (241) (244)
Data modelling life-cycle
▪ Life cycle consists of
▪Requirement gathering
▪Source driven
▪User driven
▪Identify the grain-level of detail/
fineness of data. Granularity is detailed
level of information stored in a table.
▪Identify dimensions
▪Identify the facts
▪Design the dimensional model
▪ A multinational sales company has its headquarters at New
Delhi. They sell various products which can be categorized
and sub categorized accordingly in terms of their products
and brands. Their products are sold in different countries like
India, Australia, Canada, France, Germany and Denmark.
The President of the company wants category wise sales
(products and brands) and order quantity information for
each individual employee and also team wise for every
quarter to view business productivity.

▪ Questions:

▪ Create a dimensional model (snowflake) for the above
mentioned scenario. Specify the dimensions and the fact(s)
▪ Explain the major components of the data warehousing
process using a generic conceptual diagram
24

Catagory

Sub categ Sub


ory catagory

Brand

Product

Pack size

You might also like