Professional Documents
Culture Documents
Data Warehouse Final Notes
Data Warehouse Final Notes
Fact table: The fact table is the central table in the star schema. It stores
quantitative data, such as sales amounts, product quantities, and customer
ages.
Dimension tables: The dimension tables store descriptive data, such as
product names, customer names, and sales dates.
Foreign keys: The fact table is connected to the dimension tables by foreign
keys. Foreign keys are columns in the fact table that refer to columns in the
dimension tables. This allows the data warehouse to maintain referential
integrity, which means that the data in the fact table and the dimension tables
are always consistent.
It takes less time for the While it takes more time than star
execution of queries because schema for the execution of queries
4.
there are fewer JOINs because there are more JOINs
between tables. between tables.
10. It has high data redundancy. While it has low data redundancy.
Fact tables: The fact tables are the central tables in the fact constellation
schema. They store quantitative data, such as sales amounts, product
quantities, and customer ages.
Dimension tables: The dimension tables store descriptive data, such as
product names, customer names, and sales dates.
Shared dimension tables: The shared dimension tables are dimension
tables that are used by multiple fact tables. This can improve data integrity
and performance because the data in the shared dimension tables only needs
to be stored once.
Foreign keys: The fact tables and the dimension tables are connected by
foreign keys. Foreign keys are columns in the fact tables and the dimension
tables that refer to each other. This allows the data warehouse to maintain
referential integrity, which means that the data in the fact tables and the
dimension tables are always consistent.
The fact constellation schema is a more complex data warehouse design model
than the star schema. However, it can offer some advantages, such as improved
data integrity, performance, and scalability.
However, there are also some disadvantages to using a fact constellation schema:
Business intelligence (BI) is a broad term for the process of collecting, analyzing,
and presenting data to help businesses make better decisions. Data warehousing is
a key component of BI, as it allows businesses to store and organize large amounts
of data in a way that can be easily accessed and analyzed.
Data warehouses can be a valuable tool for strategic planning, but they are not a
silver bullet. To be successful, businesses need to have a clear understanding of
their goals and objectives, and they need to be able to effectively use the data that is
stored in the data warehouse.
Here are some of the benefits of using a data warehouse for strategic planning:
ETL stands for Extract, Transform, Load and it is a process used in data
warehousing to extract data from various sources, transform it into a format suitable
for loading into a data warehouse, and then load it into the warehouse. The process
of ETL can be broken down into the following three stages:
Extract: The first stage in the ETL process is to extract data from various sources
such as transactional systems, spreadsheets, and flat files. This step involves
reading data from the source systems and storing it in a staging area.
Transform: In this stage, the extracted data is transformed into a format that is
suitable for loading into the data warehouse. This may involve cleaning and
validating the data, converting data types, combining data from multiple sources, and
creating new data fields.
Load: After the data is transformed, it is loaded into the data warehouse. This step
involves creating the physical data structures and loading the data into the
warehouse.
The ETL process is an iterative process that is repeated as new data is added to the
warehouse. The process is important because it ensures that the data in the data
warehouse is accurate, complete, and up-to-date. It also helps to ensure that the
data is in the format required for data mining and reporting.
Extraction:
The first step of the ETL process is extraction. In this step, data from various source
systems is extracted which can be in various formats like relational databases, No
SQL, XML, and flat files into the staging area. It is important to extract the data from
various source systems and store it into the staging area first and not directly into the
data warehouse because the extracted data is in various formats and can be
corrupted also. Hence loading it directly into the data warehouse may damage it and
rollback will be much more difficult. Therefore, this is one of the most important steps
of ETL process.
Transformation:
The second step of the ETL process is transformation. In this step, a set of rules or
functions are applied on the extracted data to convert it into a single standard format.
It may involve following processes/tasks:
Loading:
The Load is the process of writing the data into the target database. During the load
step, it is necessary to ensure that the load is performed correctly and with as little
resources as possible.
ADVANTAGES OR DISADVANTAGES:
Improved data quality: ETL process ensures that the data in the data
warehouse is accurate, complete, and up-to-date.
Better data integration: ETL process helps to integrate data from multiple
sources and systems, making it more accessible and usable.
Increased data security: ETL process can help to improve data security by
controlling access to the data warehouse and ensuring that only authorized
users can access the data.
Improved scalability: ETL process can help to improve scalability by
providing a way to manage and analyze large amounts of data.
Increased automation: ETL tools and technologies can automate and
simplify the ETL process, reducing the time and effort required to load and
update data in the warehouse.
Here are five important challenges for ETL (Extract, Transform, Load):
§ An independent data mart is built separately from the data warehouse and is
created specifically for a particular department or business unit.
§ It may use data from multiple sources, including the data warehouse, but it
has its own data integration and transformation processes.
§ This type of data mart provides more autonomy and flexibility but requires
additional effort for data integration.
Hybrid Data Mart: A hybrid data mart combines elements of both dependent and
independent data marts. It leverages the data warehouse for common data elements
while also incorporating department-specific data integration and transformation
processes. This type of data mart strikes a balance between central control and
departmental autonomy.
Difference between OLAP and OLTP
It is well-known as an
It is well-known as an online
Definition online database query
database modifying system.
management system.
It makes use of a
It makes use of a data
Method used standard database
warehouse.
management system (DBMS).
It is subject-oriented. Used
It is application-oriented. Used
Application for Data Mining, Analytics,
for business tasks.
Decisions making, etc.
It provides a multi-
It reveals a snapshot of present
Task dimensional view of
business tasks.
different business tasks.
OLAP (Online OLTP (Online Transaction
Category Analytical Processing) Processing)