Professional Documents
Culture Documents
Informatica: Datawarehousing Basics
Informatica: Datawarehousing Basics
INFORMATICA
Datawarehousing Basics
1. Definition of datawarehouse?
Data warehouse is a Subject oriented, Integrated, Time variant, Non volatile collection of data
in support of management's decision making process
ETL
Report Generation
ETL
Short for extract, transform, load, three database functions that are combined into one tool
• Transform -- the process of converting the extracted data from its previous form into required form
• Load -- the process of writing the data into the target database.
ETL is used to migrate data from one database to another, to form data marts and data warehouses and also to
convert databases from one format to another format
It is used to retrieve the data from various operational databases and is transformed into useful information and
finally loaded into Datawarehousing system
1 INFORMATICA
2 ABINITO
3 DATASTAGE
4. BODI
5 ORACLE WAREHOUSE BUILDERS
Report generation
In report generation, OLAP is used (i.e.) online analytical processing It is a set of specification
which allows the client applications in retrieving the data for analytical processing
It is a specialized tool that sit between a database and user in order to provide various analyses of the data
stored in the database OLAP Tool is a reporting tool which generates the reports that are useful for
Decision support for top level management
1 sairavi.informatica@gmail.com
99520 29030
INFORMATICA
1. Business Objects
2. Cognos
3. Micro strategy
4. Hyperion
5. Oracle Express
6. Microsoft Analysis Services
OLTP OLAP
1 Application Oriented (e.g., Subject Oriented (subject in the sense customer,
purchase order it is functionality product, item, time)
of an application)
2 sairavi.informatica@gmail.com
99520 29030
INFORMATICA
4. What are the types of datawarehouse?
EDW
DATAMART
It is a subset of Datawarehousing
It is a subject oriented database which supports the needs of individuals depts. In an
organizations
It is called high performance query structure
It supports particular line of business like sales, marketing etc..
3 sairavi.informatica@gmail.com
99520 29030
INFORMATICA
5. What are the modeling involved in Data Warehouse Architecture?
4 sairavi.informatica@gmail.com
99520 29030
INFORMATICA
6. What are the types of Approach in DWH?
Bottom up approach: first we need to develop data mart then we integrate these data
mart into EDW
Top down approach: first we need to develop EDW then form that EDW we develop data
mart
Bottom up
Top down
Top down
Bottom up
Planning & Designing the Data Marts without waiting for the Global warehouse design
Immediate results from the data marts
Tends to take less time to implement
Errors in critical modules are detected earlier.
Benefits are realized in the early phases.
It is a Best Approach
5 sairavi.informatica@gmail.com
99520 29030
INFORMATICA
Conceptual Data Modeling
Conceptual data model includes all major entities and relationships and does not contain much
detailed level of information about attributes and is often used in the INITIAL PLANNING PHASE
Conceptual data model is created by gathering business requirements from various sources like
business documents, discussion with functional teams, business analysts, smart management
experts and end users who do the reporting on the database. Data modelers create conceptual
data model and forward that model to functional team for their review.
Conceptual data modeling gives an idea to the functional and technical team about how business
requirements would be projected in the logical data model.
This is the actual implementation and extension of a conceptual data model. Logical data model
includes all required entities, attributes, key groups, and relationships that represent business
information and define business rules.
6 sairavi.informatica@gmail.com
99520 29030
INFORMATICA
Physical data model includes all required tables, columns, relationships, database properties
for the physical implementation of databases. Database performance, indexing strategy, physical
storage and demoralization are important parameters of a physical model.
7 sairavi.informatica@gmail.com
99520 29030
INFORMATICA
Represents business information and defines business Represents the physical implementation of the model in a
rules database.
Entity Table
Attribute Column
Definition Comment
8 sairavi.informatica@gmail.com
99520 29030
INFORMATICA
Dimensional Data Modeling
Star schema
Snow flake schema
Star flake schema (or) Hybrid schema
Multi star schema
The Star Schema Logical database design which contains a centrally located fact table surrounded
by at least one or more dimension tables
Since the database design looks like a star, hence it is called star schema db
The Dimension table contains Primary keys and the textual descriptions
It contain de-normalized business information
A Fact table contains a composite key and measures
The measure are of types of key performance indicators which are used to evaluate the
enterprise performance in the form of success and failure
Eg Total revenue , Product sale , Discount given, no of customers
To generate meaningful report the report should contain at least one dimension and one fact
table
Disadvantage:
9 sairavi.informatica@gmail.com
99520 29030
INFORMATICA
Example of Star Schema:
Snowflake Schema
In star schema, If the dimension tables are spitted into one or more dimension tables
The de-normalized dimension tables are spitted into a normalized dimension table
10 sairavi.informatica@gmail.com
99520 29030
INFORMATICA
In Snowflake schema, the example diagram shown below has 4 dimension tables, 4 lookup tables
and 1 fact table. The reason is that hierarchies (category, branch, state, and month) are being
broken out of the dimension tables (PRODUCT, ORGANIZATION, LOCATION, and TIME)
respectively and separately.
It increases the number of joins and poor performance in retrieval of data.
In few organizations, they try to normalize the dimension tables to save space.
Since dimension tables hold less space snow flake schema approach may be avoided.
Bit map indexes can not be effectively utilized
Hybrid Schema
11 sairavi.informatica@gmail.com
99520 29030
INFORMATICA
Multi Star schema
Measure Types
12 sairavi.informatica@gmail.com
99520 29030
INFORMATICA
Surrogate Key
Joins between fact and dimension tables should be based on surrogate keys
Users should not obtain any information by looking at these keys
These keys should be simple integers
Staging area needs to clean operational data before loading into data warehouse.
Cleaning in the sense your merging data which comes from different source.
It’s the area where most of the ETL is done
Data Cleansing
Types of Dimensions:
Confirmed Dimensions
Junk Dimensions Garbage Dimension
13 sairavi.informatica@gmail.com
99520 29030
INFORMATICA
Degenerative Dimensions
Confirmed is some thing which can be shared by multiple Fact Tables or multiple Data Marts.
Junk Dimensions is grouping flagged values
Degenerative Dimension is something dimensional in nature but exist fact table.(Invoice No)
Which is neither fact nor strictly dimension attributes. These are useful for some kind of analysis.
These are kept as attributes in fact table called degenerated dimension
Degenerate dimension:
A column of the key section of the fact table that does not have the associated dimension table but used for
reporting and analysis, such column is called degenerate dimension or line item dimension.
For ex, we have a fact table with customer_id, product_id, branch_id, employee_id, bill_no, and date in key
section and price, quantity, amount in measure section. In this fact table, bill_no from key section is a single
value; it has no associated dimension table. Instead of creating a Separate dimension table for that single
value, we can Include it in fact table to improve performance. SO here the column, bill_no is a degenerate
dimension or line item dimension.
14 sairavi.informatica@gmail.com
99520 29030