The dimensional model includes dimensions for time, products, employees, and locations connected to a central fact table containing sales quantities. The products dimension is structured in a snowflake schema with categories, subcategories, brands, and products levels to provide multiple granularities of product data. This model allows analyzing sales by time period, product type, employee, and location.
The dimensional model includes dimensions for time, products, employees, and locations connected to a central fact table containing sales quantities. The products dimension is structured in a snowflake schema with categories, subcategories, brands, and products levels to provide multiple granularities of product data. This model allows analyzing sales by time period, product type, employee, and location.
The dimensional model includes dimensions for time, products, employees, and locations connected to a central fact table containing sales quantities. The products dimension is structured in a snowflake schema with categories, subcategories, brands, and products levels to provide multiple granularities of product data. This model allows analyzing sales by time period, product type, employee, and location.
Key concepts ▪Data modeling ▪Multidimensional modeling ▪Star schema , snowflake schema Data Modeling Basics ▪Entity: anything real or abstract for which data is to be stored. It can be roles, events, locations, tangible things, concepts ▪Attributes: characteristic property of an entity. Entity can have multiple attributes ▪Cardinality of relationship: one to one, one to many-person to chair ( one to One), project to person( One to Many) Types of data model ▪Conceptual data model ▪Identifies most important entities ▪Identifies relationships between different entities ▪Does not support the specification of attributes ▪Does not support the specification of the primary key (208) Types of data model ▪ Logical data model ▪Identifies all entities and relationships ▪Identifies all the attributes for each entity ▪Specifies the primary key for each entity ▪Specifies the foreign keys ▪Normalization of entities performed(214) First Normal form rules ▪The table must be two dimensional ▪Each row contains data about one thing or one part of the thing ▪Each column contains data for a single attribute ▪All entries in the column must of the same type ▪Each column must have a unique name ▪No two rows may be identical ▪Example Name , Date of Birth, address Second Normal form rules ▪ 2NF = 1NF + each column depends on the ENTIRE primary key. ▪Primary ley can be made of one or more attributes. ▪ Example : Name , DOB and Hobbies ▪Split into two tables of 1NF Name and DOB and another with Name+ DOB and Hobby . ▪https://www.youtube.com/watch?v=K7 vzLrGCV50&list=PLQ9AAKW8HuJ5m 0rmHKL88ZyjOIKejvrj0 Types of data model ▪ Logical data model ▪ 1NF: relationship should have atomic (simple and visible) attributes. ▪2NF: primary key contains multiple attributes (composite primary key) ▪3NF :relationship should not have a non-key attribute Normalization vs denormalization ▪ The difference between normalization and denormalizat ion is simple. ▪When data is normalized it exists in one and only one source-of-truth location. ▪Denormalized data exists in multiple summarized locations. Data living in one or many locations has important consequences for accuracy and speed. Normalization vs denormalization ▪ Normalization is the process of reorganizing data in a database so that it meets two basic requirements: ▪ (1) There is no redundancy of data (all data is stored in only one place), and ▪(2) data dependencies are logical (all related data items are stored together). Types of data model ▪Physical model: a representation of how the model will be built in the database. It will exhibit all table structures including column names, column data types, column constraints, primary key, foreign key and relationship between tables ▪Convert entities into tables/relations ▪Convert relationships into foreign keys ▪Convert attributes into columns/fields (218) Data Modelling techniques ▪ Entity relationship modeling ▪Identify all entities ▪Identify the relationship among entities along with cardinality ▪Identify key attributes ▪Identify all the relevant attributes ▪Plot the ER diagram and review with business users. (222) 14 Data Modelling techniques ▪Dimensional modelling: to understand multidimensional view of the data (same data, multiple perspectives). ▪It is a logical design technique for structuring data for delivering fast query performance. ▪Dimensional database can be viewed as a cube of three or more dimensions for analyzing the data. Data modelling techniques ▪Dimensional modelling divides database into (1) Measurement and (2) Context. ▪Measurements are usually numeric values called FACTS ▪Facts are related to various contexts. ▪ Contexts are divided into independent logical clumps called DIMENSIONS ▪Dimensions describe the “who, what, when, where, why and how” context of measurements Data modelling techniques ▪ Fact table: ▪Consists of various measurements. ▪Stores measures of business process and points to the lowest detail level of each dimension table ▪Measures are factual/quantitative and are generally numeric ▪Additive facts ( revenue), non-additive facts(temp),semi-additive facts ( inventory) Data modelling techniques ▪ Dimension table ▪Consists of dimensional attributes ( descriptive) which describe the dimension elements to enhance comprehension ▪Typically static values containing textual data or discrete numbers ▪Used for query filtering/constraining Data modelling techniques ▪ Dimension table attribute must be ▪Verbose – consist of full words ▪Descriptive- convey the purpose in few words ▪Complete- must not contain missing values ▪Discrete values- only one value per row ▪Quality assured- must not contain misspelt values or impossible values Data modelling techniques ▪Star schema ▪Consist of a large central – fact table with no redundancy ▪Fact table will is referred by a number of dimension tables. (Aka look up or reference tables) (239) Data modelling techniques ▪ Snowflake schema: ▪When dimensions of a star schema are detailed and highly structured having several levels of relationship. ▪Results from further expansion and normalization of the dimension table ▪The child tables may have multiple parent tables. ▪Snowflake in effect affects only dimension tables and does not affect the fact table (241) (244) Data modelling life-cycle ▪ Life cycle consists of ▪Requirement gathering ▪Source driven ▪User driven ▪Identify the grain-level of detail/ fineness of data. Granularity is detailed level of information stored in a table. ▪Identify dimensions ▪Identify the facts ▪Design the dimensional model ▪ A multinational sales company has its headquarters at New Delhi. They sell various products which can be categorized and sub categorized accordingly in terms of their products and brands. Their products are sold in different countries like India, Australia, Canada, France, Germany and Denmark. The President of the company wants category wise sales (products and brands) and order quantity information for each individual employee and also team wise for every quarter to view business productivity. ▪ ▪ Questions: ▪ ▪ Create a dimensional model (snowflake) for the above mentioned scenario. Specify the dimensions and the fact(s) ▪ Explain the major components of the data warehousing process using a generic conceptual diagram 24