Joins

You might also like

Download as txt, pdf, or txt
Download as txt, pdf, or txt
You are on page 1of 8

Important aspects of Star Schema & Snow Flake Schema

 In a star schema every dimension will have a primary key.


 In a star schema, a dimension table will not have any parent table.
 Whereas in a snow flake schema, a dimension table will have one or more parent
tables.
 Hierarchies for the dimensions are stored in the dimensional table itself in
star schema.
 Whereas hierarchies are broken into separate tables in snow flake schema. These
hierarchies help to drill down the data from topmost hierarchies to the lowermost
hierarchies.
Star flake schema (or) Hybrid Schema
 Hybrid schema is a combination of Star and Snowflake schema
Multi Star schema
 Multiple fact tables sharing a set of dimension tables

 Confirmed Dimensions are nothing but Reusable Dimensions.


 The dimensions which u r using multiple times or in multiple data marts.
 Those are common in different data marts
Measure Types (or) Types of Facts
• Additive - Measures that can be summed up across all dimensions.
o Ex: Sales Revenue
• Semi Additive - Measures that can be summed up across few dimensions and not
with others
o Ex: Current Balance
• Non Additive - Measures that cannot be summed up across any of the
dimensions.
o Ex: Student attendance
Surrogate Key
 Joins between fact and dimension tables should be based on surrogate keys
 Users should not obtain any information by looking at these keys
 These keys should be simple integers

A sample data warehouse schema


WHY NEED STAGING AREA FOR DWH?
 Staging area needs to clean operational data before loading into data
warehouse.
 Cleaning in the sense your merging data which comes from different source.
 It’s the area where most of the ETL is done
Data Cleansing
 It is used to remove duplications
 It is used to correct wrong email addresses
 It is used to identify missing data
 It used to convert the data types
 It is used to capitalize name & addresses.
Types of Dimensions:
There are three types of Dimensions
 Confirmed Dimensions
 Junk Dimensions Garbage Dimension
 Degenerative Dimensions
 Slowly changing Dimensions
Garbage Dimension or Junk Dimension
 Confirmed is something which can be shared by multiple Fact Tables or multiple
Data Marts.
 Junk Dimensions is grouping flagged values
 Degenerative Dimension is something dimensional in nature but exist fact table.
(Invoice No)
Which is neither fact nor strictly dimension attributes. These are
useful for some kind of analysis. These are kept as attributes in fact table called
degenerated dimension
Degenerate dimension: A column of the key section of the fact table that does not
have the associated dimension table but used for reporting and analysis, such
column is called degenerate dimension or line item dimension.
For ex, we have a fact table with customer_id, product_id, branch_id, employee_id,
bill_no, and date in key section and price, quantity, amount in measure section. In
this fact table, bill_no from key section is a single value; it has no associated
dimension table. Instead of creating a
Separate dimension table for that single value, we can Include it in fact table to
improve performance. SO here the column, bill_no is a degenerate dimension or line
item dimension.

Informatica Architecture

The Power Center domain


It is a primary unit of the Administration.
Can have single and multiple domains.
It is a collection of nodes and services.
Nodes
A node is the logical representation of a machine in a domain
One node in the domain acts as a gateway node to receive service requests from
clients and route them to the appropriate service and node
Integration Service:
Integration Service does all the real job. It extracts data from sources, processes
it as per the business logic and loads data to targets.
Repository Service:
Repository Service is used to fetch the data from the repository and sends it back
to the requesting components (mostly client tools and integration service)
Power Center Repository:
Repository is nothing but a relational database which stores all the metadata
created in Power Center.
Power Center Client Tools:
The Power Center Client consists of multiple tools.
Power Center Administration Console:
This is simply a web-based administration tool you can use to administer the Power
Center installation.

Q. How can you define a transformation? What are different types of transformations
available in Informatica?
A. A transformation is a repository object that generates, modifies, or passes
data. The Designer provides a set of transformations that perform specific
functions. For example, an Aggregator transformation performs calculations on
groups of data. Below are the various transformations available in Informatica:
• Aggregator
• Custom
• Expression
• External Procedure
• Filter
• Input
• Joiner
• Lookup
• Normalizer
• Rank
• Router
• Sequence Generator
• Sorter

Important aspects of Star Schema & Snow Flake Schema


 In a star schema every dimension will have a primary key.
 In a star schema, a dimension table will not have any parent table.
 Whereas in a snow flake schema, a dimension table will have one or more parent
tables.
 Hierarchies for the dimensions are stored in the dimensional table itself in
star schema.
 Whereas hierarchies are broken into separate tables in snow flake schema. These
hierarchies help to drill down the data from topmost hierarchies to the lowermost
hierarchies.
Star flake schema (or) Hybrid Schema
 Hybrid schema is a combination of Star and Snowflake schema
Multi Star schema
 Multiple fact tables sharing a set of dimension tables

 Confirmed Dimensions are nothing but Reusable Dimensions.


 The dimensions which u r using multiple times or in multiple data marts.
 Those are common in different data marts
Measure Types (or) Types of Facts
• Additive - Measures that can be summed up across all dimensions.
o Ex: Sales Revenue
• Semi Additive - Measures that can be summed up across few dimensions and not
with others
o Ex: Current Balance
• Non Additive - Measures that cannot be summed up across any of the
dimensions.
o Ex: Student attendance
Surrogate Key
 Joins between fact and dimension tables should be based on surrogate keys
 Users should not obtain any information by looking at these keys
 These keys should be simple integers

A sample data warehouse schema


WHY NEED STAGING AREA FOR DWH?
 Staging area needs to clean operational data before loading into data
warehouse.
 Cleaning in the sense your merging data which comes from different source.
 It’s the area where most of the ETL is done
Data Cleansing
 It is used to remove duplications
 It is used to correct wrong email addresses
 It is used to identify missing data
 It used to convert the data types
 It is used to capitalize name & addresses.
Types of Dimensions:
There are three types of Dimensions
 Confirmed Dimensions
 Junk Dimensions Garbage Dimension
 Degenerative Dimensions
 Slowly changing Dimensions
Garbage Dimension or Junk Dimension
 Confirmed is something which can be shared by multiple Fact Tables or multiple
Data Marts.
 Junk Dimensions is grouping flagged values
 Degenerative Dimension is something dimensional in nature but exist fact table.
(Invoice No)
Which is neither fact nor strictly dimension attributes. These are
useful for some kind of analysis. These are kept as attributes in fact table called
degenerated dimension
Degenerate dimension: A column of the key section of the fact table that does not
have the associated dimension table but used for reporting and analysis, such
column is called degenerate dimension or line item dimension.
For ex, we have a fact table with customer_id, product_id, branch_id, employee_id,
bill_no, and date in key section and price, quantity, amount in measure section. In
this fact table, bill_no from key section is a single value; it has no associated
dimension table. Instead of creating a
Separate dimension table for that single value, we can Include it in fact table to
improve performance. SO here the column, bill_no is a degenerate dimension or line
item dimension.

Informatica Architecture

The Power Center domain


It is a primary unit of the Administration.
Can have single and multiple domains.
It is a collection of nodes and services.
Nodes
A node is the logical representation of a machine in a domain
One node in the domain acts as a gateway node to receive service requests from
clients and route them to the appropriate service and node
Integration Service:
Integration Service does all the real job. It extracts data from sources, processes
it as per the business logic and loads data to targets.
Repository Service:
Repository Service is used to fetch the data from the repository and sends it back
to the requesting components (mostly client tools and integration service)
Power Center Repository:
Repository is nothing but a relational database which stores all the metadata
created in Power Center.
Power Center Client Tools:
The Power Center Client consists of multiple tools.
Power Center Administration Console:
This is simply a web-based administration tool you can use to administer the Power
Center installation.

Q. How can you define a transformation? What are different types of transformations
available in Informatica?
A. A transformation is a repository object that generates, modifies, or passes
data. The Designer provides a set of transformations that perform specific
functions. For example, an Aggregator transformation performs calculations on
groups of data. Below are the various transformations available in Informatica:
• Aggregator
• Custom
• Expression
• External Procedure
• Filter
• Input
• Joiner
• Lookup
• Normalizer
• Rank
• Router
• Sequence Generator
• Sorter

Important aspects of Star Schema & Snow Flake Schema


 In a star schema every dimension will have a primary key.
 In a star schema, a dimension table will not have any parent table.
 Whereas in a snow flake schema, a dimension table will have one or more parent
tables.
 Hierarchies for the dimensions are stored in the dimensional table itself in
star schema.
 Whereas hierarchies are broken into separate tables in snow flake schema. These
hierarchies help to drill down the data from topmost hierarchies to the lowermost
hierarchies.
Star flake schema (or) Hybrid Schema
 Hybrid schema is a combination of Star and Snowflake schema
Multi Star schema
 Multiple fact tables sharing a set of dimension tables

 Confirmed Dimensions are nothing but Reusable Dimensions.


 The dimensions which u r using multiple times or in multiple data marts.
 Those are common in different data marts
Measure Types (or) Types of Facts
• Additive - Measures that can be summed up across all dimensions.
o Ex: Sales Revenue
• Semi Additive - Measures that can be summed up across few dimensions and not
with others
o Ex: Current Balance
• Non Additive - Measures that cannot be summed up across any of the
dimensions.
o Ex: Student attendance
Surrogate Key
 Joins between fact and dimension tables should be based on surrogate keys
 Users should not obtain any information by looking at these keys
 These keys should be simple integers

A sample data warehouse schema


WHY NEED STAGING AREA FOR DWH?
 Staging area needs to clean operational data before loading into data
warehouse.
 Cleaning in the sense your merging data which comes from different source.
 It’s the area where most of the ETL is done
Data Cleansing
 It is used to remove duplications
 It is used to correct wrong email addresses
 It is used to identify missing data
 It used to convert the data types
 It is used to capitalize name & addresses.
Types of Dimensions:
There are three types of Dimensions
 Confirmed Dimensions
 Junk Dimensions Garbage Dimension
 Degenerative Dimensions
 Slowly changing Dimensions
Garbage Dimension or Junk Dimension
 Confirmed is something which can be shared by multiple Fact Tables or multiple
Data Marts.
 Junk Dimensions is grouping flagged values
 Degenerative Dimension is something dimensional in nature but exist fact table.
(Invoice No)
Which is neither fact nor strictly dimension attributes. These are
useful for some kind of analysis. These are kept as attributes in fact table called
degenerated dimension
Degenerate dimension: A column of the key section of the fact table that does not
have the associated dimension table but used for reporting and analysis, such
column is called degenerate dimension or line item dimension.
For ex, we have a fact table with customer_id, product_id, branch_id, employee_id,
bill_no, and date in key section and price, quantity, amount in measure section. In
this fact table, bill_no from key section is a single value; it has no associated
dimension table. Instead of creating a
Separate dimension table for that single value, we can Include it in fact table to
improve performance. SO here the column, bill_no is a degenerate dimension or line
item dimension.

Informatica Architecture

The Power Center domain


It is a primary unit of the Administration.
Can have single and multiple domains.
It is a collection of nodes and services.
Nodes
A node is the logical representation of a machine in a domain
One node in the domain acts as a gateway node to receive service requests from
clients and route them to the appropriate service and node
Integration Service:
Integration Service does all the real job. It extracts data from sources, processes
it as per the business logic and loads data to targets.
Repository Service:
Repository Service is used to fetch the data from the repository and sends it back
to the requesting components (mostly client tools and integration service)
Power Center Repository:
Repository is nothing but a relational database which stores all the metadata
created in Power Center.
Power Center Client Tools:
The Power Center Client consists of multiple tools.
Power Center Administration Console:
This is simply a web-based administration tool you can use to administer the Power
Center installation.

Q. How can you define a transformation? What are different types of transformations
available in Informatica?
A. A transformation is a repository object that generates, modifies, or passes
data. The Designer provides a set of transformations that perform specific
functions. For example, an Aggregator transformation performs calculations on
groups of data. Below are the various transformations available in Informatica:
• Aggregator
• Custom
• Expression
• External Procedure
• Filter
• Input
• Joiner
• Lookup
• Normalizer
• Rank
• Router
• Sequence Generator
• Sorter

Important aspects of Star Schema & Snow Flake Schema


 In a star schema every dimension will have a primary key.
 In a star schema, a dimension table will not have any parent table.
 Whereas in a snow flake schema, a dimension table will have one or more parent
tables.
 Hierarchies for the dimensions are stored in the dimensional table itself in
star schema.
 Whereas hierarchies are broken into separate tables in snow flake schema. These
hierarchies help to drill down the data from topmost hierarchies to the lowermost
hierarchies.
Star flake schema (or) Hybrid Schema
 Hybrid schema is a combination of Star and Snowflake schema
Multi Star schema
 Multiple fact tables sharing a set of dimension tables

 Confirmed Dimensions are nothing but Reusable Dimensions.


 The dimensions which u r using multiple times or in multiple data marts.
 Those are common in different data marts
Measure Types (or) Types of Facts
• Additive - Measures that can be summed up across all dimensions.
o Ex: Sales Revenue
• Semi Additive - Measures that can be summed up across few dimensions and not
with others
o Ex: Current Balance
• Non Additive - Measures that cannot be summed up across any of the
dimensions.
o Ex: Student attendance
Surrogate Key
 Joins between fact and dimension tables should be based on surrogate keys
 Users should not obtain any information by looking at these keys
 These keys should be simple integers

A sample data warehouse schema


WHY NEED STAGING AREA FOR DWH?
 Staging area needs to clean operational data before loading into data
warehouse.
 Cleaning in the sense your merging data which comes from different source.
 It’s the area where most of the ETL is done
Data Cleansing
 It is used to remove duplications
 It is used to correct wrong email addresses
 It is used to identify missing data
 It used to convert the data types
 It is used to capitalize name & addresses.
Types of Dimensions:
There are three types of Dimensions
 Confirmed Dimensions
 Junk Dimensions Garbage Dimension
 Degenerative Dimensions
 Slowly changing Dimensions
Garbage Dimension or Junk Dimension
 Confirmed is something which can be shared by multiple Fact Tables or multiple
Data Marts.
 Junk Dimensions is grouping flagged values
 Degenerative Dimension is something dimensional in nature but exist fact table.
(Invoice No)
Which is neither fact nor strictly dimension attributes. These are
useful for some kind of analysis. These are kept as attributes in fact table called
degenerated dimension
Degenerate dimension: A column of the key section of the fact table that does not
have the associated dimension table but used for reporting and analysis, such
column is called degenerate dimension or line item dimension.
For ex, we have a fact table with customer_id, product_id, branch_id, employee_id,
bill_no, and date in key section and price, quantity, amount in measure section. In
this fact table, bill_no from key section is a single value; it has no associated
dimension table. Instead of creating a
Separate dimension table for that single value, we can Include it in fact table to
improve performance. SO here the column, bill_no is a degenerate dimension or line
item dimension.

Informatica Architecture

The Power Center domain


It is a primary unit of the Administration.
Can have single and multiple domains.
It is a collection of nodes and services.
Nodes
A node is the logical representation of a machine in a domain
One node in the domain acts as a gateway node to receive service requests from
clients and route them to the appropriate service and node
Integration Service:
Integration Service does all the real job. It extracts data from sources, processes
it as per the business logic and loads data to targets.
Repository Service:
Repository Service is used to fetch the data from the repository and sends it back
to the requesting components (mostly client tools and integration service)
Power Center Repository:
Repository is nothing but a relational database which stores all the metadata
created in Power Center.
Power Center Client Tools:
The Power Center Client consists of multiple tools.
Power Center Administration Console:
This is simply a web-based administration tool you can use to administer the Power
Center installation.

Q. How can you define a transformation? What are different types of transformations
available in Informatica?
A. A transformation is a repository object that generates, modifies, or passes
data. The Designer provides a set of transformations that perform specific
functions. For example, an Aggregator transformation performs calculations on
groups of data. Below are the various transformations available in Informatica:
• Aggregator
• Custom
• Expression
• External Procedure
• Filter
• Input
• Joiner
• Lookup
• Normalizer
• Rank
• Router
• Sequence Generator
• Sorter

You might also like