Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

`

ROBERT GORDON
UNIVERSITY

DATA WAREHOUSING
CMM 701

Name : MTM. NAASIR


RGUID : 2330847
Submission : November 11, 2023
Table of Contents
Table of Tables .......................................................................................................................... iii
Table of figures ......................................................................................................................... iii
SECTION A – ESSAY ..................................................................................................................... 1
Comparison of theories of Kimball and Inmon in relation to Data Warehouse design ......................1
Real Time Data warehousing: Innovations and Implications ..............................................................3

Section B..................................................................................................................................... 6
ER diagram for toy store ......................................................................................................................6
Dimensional Modelling of toy store ....................................................................................................7

Section D: Change data capture ................................................................................................ 9


Implementation of in-built change data capture in MS SQL Server databases ..................................9
Implement of CDC using triggers ...................................................................................................... 10

References ................................................................................................................................12

ii
Table of Tables

Table 1: Data Warehouse Approach Quick Comparison.........................................................................2

Table of figures

Figure 1: Inmon Approach Diagram........................................................................................................1

Figure 2: Kimball Approach Diagram .....................................................................................................2

Figure 3: Illustration of Real Time Data Warehousing ...........................................................................3

Figure 4: ER diagram of a Toy store .......................................................................................................6

Figure 5: Dimensional modelling of toy store .........................................................................................7

Figure 6: Created Tables and Database ...................................................................................................7

Figure 7: Sales table design view ............................................................................................................8

Figure 8: Sales table relationship mappings ............................................................................................8

Figure 9: [cdc][dbo_dim_product_CT] table after operations ...............................................................10

Figure 10: dim_product_CDC table after operations ............................................................................11

iii
SECTION A – ESSAY

Comparison of theories of Kimball and Inmon in relation to Data Warehouse


design

Many firms are investing in data these days because they recognize its potential as a source of revenue.
This involves monetizing data through advertising, market research, and increasing e-commerce and
personalization strategies. Data warehousing is critical for successful data utilization and
administration. The Inmon and Kimball methods are two popular data warehouse design
methodologies. Consider how the Inmon and Kimball techniques might be used in two different
scenarios: a bank and a shop.

Inmon’s top-down approach:

Imagine a bank with several departments, such as loans, deposits, and mortgages. Each department is
responsible for its own operating systems and databases. The Inmon strategy would begin by
establishing an enterprise-wide data warehouse that would integrate data from all these disparate
sources. This centralized data warehouse would provide consistency throughout the organization.

Figure 1: Inmon Approach Diagram

Once the corporate data warehouse is in place, the bank can use it to develop department-specific data
marts. For example, the loans department may maintain a data mart containing fact tables about loan
transactions. These data marts cater to the individual needs of each department, giving them with the
data they require in an easy-to-use and comprehend format.

Kimball’s bottom-up approach:

1
Now consider a retailer that has separate business processes for sales, inventory management, and
customer feedback. Rather than constructing an enterprise-wide data warehouse, as the Inmon strategy
does, the Kimball approach would begin by constructing discrete data marts for each of these business

operations.

Figure 2: Kimball Approach Diagram

The sales department, for example, could have a data mart with dimensions for product, time, and place,
as well as facts for quantity sold and total sales. These data marts can be constructed fast and iteratively,
allowing the merchant to obtain insights and make data-driven decisions more swiftly.

Individual data marts can be merged over time to create a consolidated view of the entire business. This
is effectively a "virtual" data warehouse formed by combining multiple data marts.

Table 1: Data Warehouse Approach Quick Comparison

Inmon Kimbell

Launch Time Time-consuming Efficient

Cost High initial, lower ongoing Low initial, higher in phases

Skill Requirement Specialist team Generalist team

Data Integration Enterprise-wide Individual business areas


requirements

Maintenance Lower Higher

2
Conclusion:

Inmon and Kimball are not mutually exclusive, and the choice between them should be aligned with
the specific circumstances and objectives of the organisation. Because of its normalised data model,
centralization, and emphasis on data integrity, Inmon is well-suited to larger and more complex
companies. Kimball's denormalized approach, flexibility, and agility in data analysis are better suited
to smaller, more nimble organisations. Organisations should evaluate issues such as data complexity,
centralization requirements, storage capacity, and reporting requirements when making a decision.

Real Time Data warehousing: Innovations and Implications

Data warehousing has reached an important phase in its evolution in this age of rapidly changing digital
technologies. The article discusses the discipline of real-time data warehousing, outlining the
innovations, significance, and crucial potential of this technology in the data landscape.

Traditional data warehouses consolidate historical data from various sources, providing comprehensive
insights into an organization's past activities. However, these insights are often outdated. Real-time data
warehousing (RTDW), in contrast, enhances this approach by continually updating stored data. This
dynamic method enables instant analysis and decision-making, offering a current snapshot of the
organization's operations at any given moment.

Figure 3: Illustration of Real Time Data Warehousing

3
An RTDW allows you to store real-time data and evaluate it almost instantly, in contrast to standard
data warehouses that hold historical data for subsequent analysis. This system allows for real-time
reporting and ad hoc analytics, as well as continuous processing of real-time data with low latency and
high throughput performance. Businesses may accelerate growth, make quick decisions, and give their
clients value by utilising RTDW. But it's crucial to remember that because an RTDW requires more
sophisticated infrastructure and technology than standard data warehouses, its implementation and
upkeep can be more difficult and expensive.

Following technologies empower the capacity of real time data warehouses.

In-memory processing involves storing data in the main memory of a computing environment rather
than on traditional physical disks. This technology significantly speeds up data processing and enables
real-time analytics. It is particularly beneficial in RTDW as it allows data to be analyzed in real-time,
facilitating faster reporting and decision-making in business. However, it’s important to note that in-
memory analytics platforms can be expensive and challenging to scale.

Stream processing is a computing method where data is processed in real-time as it is generated. This
is radically different from traditional data pipelines, where data is stored in a data warehouse and then
processed in batches. Stream processing involves the continuous transmission and processing of raw
data, allowing the processing and analyzing of such data immediately when it is generated. It enables
real-time analytics, providing insights within milliseconds of the data being produced. Stream
processing is essential in scenarios where time-sensitive actions are crucial.

Change Data Capture (CDC) is a process that identifies and captures changes made in a database or
source application, then delivers those changes in real-time to a downstream process, system, or data
lake. CDC technology lets users apply changes downstream throughout the enterprise. With CDC, only
data that has changed is synchronized, making it exponentially more efficient than replicating an entire
database. This technology is particularly useful in RTDW as it ensures organizations always have access
to the freshest, most recent data.

Real-time data warehousing is revolutionizing the way businesses harness the power of data. From
immediate insights to agile decision-making, the benefits of real-time data warehousing are undeniable.
By adopting real-time data warehousing solutions, organizations can unlock the full potential of their
data, gain a competitive edge, and drive growth in today’s fast-paced business environment. Embracing
real-time data warehousing is not just a necessity, but a strategic imperative for businesses. If we analyse
the potential benefits of having real time data ware housing system as follows:

• Real-time data warehousing allows businesses to process and analyze data as it comes in,
providing immediate insights. This enables businesses to identify trends and make data-driven
decisions quickly.
• By integrating data from multiple sources in real-time, businesses can eliminate data silos and
ensure a unified view of their operations. This leads to better coordination between
departments, improved resource allocation, and optimized decision-making processes.
• It enables businesses to monitor key metrics and indicators continuously, allowing for early
detection of potential risks and vulnerabilities. This proactive approach to risk management
can help mitigate impact and minimize potential losses.
• It allows businesses to collect, analyze, and act upon customer data in real-time. This enables
businesses to deliver highly tailored products, services, and marketing campaigns based on
individual customer preferences and behaviors.

4
• And also, enhances the capabilities of business intelligence (BI) systems. By integrating real-
time data into BI dashboards and reports, businesses can gain a comprehensive and up-to-date
view of their business performance. This allows for faster identification of trends,
opportunities, and challenges, enabling businesses to fine-tune their strategies and optimize
processes.

While data warehousing offers numerous benefits, it also presents several challenges:

Managing Data Structure and Optimization: As the volume of data in a warehouse increases,
structuring the data for future operations becomes increasingly complex, potentially slowing down the
Extract, Transform, Load (ETL) process. System optimization requires careful design and
configuration of data analysis tools to align with business needs.

Costs of Data Warehousing: The financial investment required to set up and maintain a data
warehouse can be significant.

Data Quality and Accuracy: Ensuring that the data in the warehouse is accurate, up-to-date, and of
high quality is a constant challenge.

Integration with Other Systems: Designing the data warehouse so that it can be easily integrated
with other systems is another challenge.

These challenges underscore the need for careful planning, efficient design, and ongoing management
to ensure the successful implementation and operation of a data warehouse.

Emerging trends in real-time data warehousing are shaping the future of data management. One such
trend is the integration of Artificial Intelligence (AI) into Real-Time Data Warehousing (RDW). AI can
assist in various aspects of data warehousing, such as automating the Extract, Transform, Load (ETL)
process, enabling smart data modeling, and facilitating automated data cleansing. By applying AI
capabilities at the data source, businesses can eliminate inefficiencies, gain greater operational
awareness, and improve decision-making.

Another significant trend is the rise of edge computing in real-time analytics. Edge computing is a
distributed computing paradigm that brings computation and data storage closer to the source of data
generation. This enables real-time processing and analysis, reducing latency, and enhancing efficiency.
Real-time data processing at the edge is essential for applications that require immediate responses,
such as monitoring critical infrastructure or making split-second decisions in autonomous systems.
Edge computing also optimizes bandwidth usage by processing and filtering data at the edge,
transmitting only relevant information to the cloud or data center.

These innovations are revolutionizing the way businesses harness the power of data, enabling
immediate insights, agile decision-making, and improved operational efficiency. As these trends
continue to evolve, real-time data warehousing will undoubtedly play a crucial role in shaping the future
of data-driven decision making.

5
Section B

ER diagram for toy store

Figure 4: ER diagram of a Toy store

Here, I am having following assumptions,

• branch name (Store_name) is unique.


• One city can have multiple branches.
• Every barcode scan of a product in a transaction at POS.
• Only one promotion can be applied at a time that will be selected at POS.

I also attached an excel sheet called dw.xlsx to illustrate the source to target mapping of this data
warehouse.

6
Dimensional Modelling of toy store

Figure 5: Dimensional modelling of toy store

I made two databases TOYSTORE_OLTP to record transactions of toy store as well as,
TOYSTORE_OLAP for analytics purpose our toy store. I considered analysis based on periodical,
geographical, product wise, and customer wise sales. Then I implemented the tables creation &
relationship mapping. I would attach sample process screenshots to explain it.

Figure 6: Created Tables and Database

7
Figure 7: Sales table design view

Figure 8: Sales table relationship mappings

Note: Tables of OLTP database & OLAP databases are available at create_table_OLTP,
create_table_OLAP script files.

8
Section D: Change data capture

A collection of software procedures known as change data capture, or CDC, finds changes in source
tables and databases and frequently tracks and updates those changes in real-time. Change data capture
is a great way for organizations to engage with their data more effectively because CDC operates in
real-time movement.

Implementation of in-built change data capture in MS SQL Server databases

We can’t use SQL server express edition for CDC in-built implementation. I used SQL Server
developer edition for this use case. First, the required tables should be available at database. I used
TOYSTORE_OLAP database and dim_product table.

First step we must change the database ownership sysadmin ([sa]). Thereafter, we should enable CDC
on database and table respectively.

Enable CDC on the Database:

USE TOYSTORE_OLAP;
EXEC sys.sp_cdc_enable_db;

Enable CDC on the Table:

EXEC sys.sp_cdc_enable_table
@source_schema = N'dbo',
@source_name = N'dim_product',
@role_name = N'dbo_dim_product_CDC',
@supports_net_changes = 1;

Create Capture and Cleanup Jobs:

• Capture Job: This captures the changes and stores them in CDC tables.

EXEC sys.sp_cdc_add_job @job_type = N'capture';

• Cleanup Job: This job removes old data from the CDC tables.

EXEC sys.sp_cdc_add_job @job_type = N'cleanup';

Start CDC Jobs:

EXEC sys.sp_cdc_start_job @job_type = N'capture';


EXEC sys.sp_cdc_start_job @job_type = N'cleanup';

9
After that I used insert, update and delete queries to validate the functionality. I also attached those
queries as well as CDC_INSERT.sql, CDC_UPDATE.sql and CDC_DELETE.sql files in file
directory.

After I applied those operations, I can able view the changes at system table of
[cdc][dbo_dim_product_CT] as follows

Figure 9: [cdc][dbo_dim_product_CT] table after operations

Column _$operation represents which type of operation occurred on respective product_key column.
This information is enough to track back all the changes detected after CDC enabled.

Note: All related queries for this section is attached at cdc-auto folder of zip file.

Implement of CDC using triggers

We can build change data capture at the application level is defining database triggers and creating
your own change log in shadow tables. Triggers fire after INSERT, UPDATE, or DELETE
commands (that indicate a change) and are used to create a change log.

I created an audit table and associated triggers for CDC on the dim_product table.

CREATE TABLE [dbo].[dim_product_CDC](


[CDC_ID] [numeric](18, 0) IDENTITY(1,1) NOT NULL,
[product_key] [numeric](18, 0) NULL,
[operation] [varchar](10) NULL,
[capture_date] [datetime] DEFAULT GETDATE(),
[old_product_id] [numeric](18, 0) NULL,
[old_product_name] [varchar](50) NULL,
[old_category] [varchar](50) NULL,
[old_price] [money] NULL,
[old_type] [varchar](50) NULL,
[old_unit_stocks] [int] NULL
)

There after I implemented triggers to apply change log on audit table.

CREATE TRIGGER trg_dim_product_insert


ON dbo.dim_product
AFTER INSERT
AS
BEGIN
INSERT INTO dim_product_CDC (product_key, operation, old_product_id,
old_product_name, old_category, old_price, old_type, old_unit_stocks)
SELECT product_key, 'INSERT', NULL, NULL, NULL, NULL, NULL, NULL

10
FROM inserted
END

CREATE TRIGGER trg_dim_product_update


ON dbo.dim_product
AFTER UPDATE
AS
BEGIN
INSERT INTO dim_product_CDC (product_key, operation, old_product_id,
old_product_name, old_category, old_price, old_type, old_unit_stocks)
SELECT deleted.product_key, 'UPDATE', deleted.product_id,
deleted.product_name, deleted.category, deleted.price, deleted.type,
deleted.unit_stocks
FROM deleted
END

CREATE TRIGGER trg_dim_product_delete


ON dbo.dim_product
AFTER DELETE
AS
BEGIN
INSERT INTO dim_product_CDC (product_key, operation, old_product_id,
old_product_name, old_category, old_price, old_type, old_unit_stocks)
SELECT product_key, 'DELETE', product_id, product_name, category,
price, type, unit_stocks
FROM deleted
END

There after I did a test on that database to validate this functionality.

Figure 10: dim_product_CDC table after operations

This can illustrate what changes happened throughout the table of dim_product.

Note: All related queries for this section is attached at cdc-triggers folder of zip file.

11
References

1. admin, 2023. 6 Benefits of Real-Time Data Warehousing | VLT Logistic. VLT Logistics. URL
https://www.vltlogistics.co.uk/6-benefits-of-real-time-data-warehousing/ (accessed 09.11.23).
2. Anwar, M., 2023. The Future of AI in Data Warehousing: Trends and Predictions. Astera. URL
https://www.astera.com/type/blog/ai-in-data-warehousing/ (accessed 09.11.23).
3. Data Mart vs. Data Warehouse: The Difference with Examples [WWW Document], n.d. . Panoply.
URL https://panoply.io/data-warehouse-guide/data-mart-vs-data-warehouse/ (accessed 11.10.23).
4. Hand, D., 2021. Seven Common Challenges Fueling Data Warehouse Modernisation [WWW
Document]. Cloudera Blog. URL https://blog.cloudera.com/seven-common-challenges-fueling-
data-warehouse-modernisation/ (accessed 09.11.23).
5. Kimball, R., Ross, M., 2002. The Data Warehouse Toolkit: The Complete Guide to Dimensional
Modeling, 2. ed. Wiley.
6. Kimball vs Inmon | 7 Amazing Key Comparisons You Should Know, 2020. . EDUCBA. URL
https://www.educba.com/kimball-vs-inmon/ (accessed 07.10.23).
7. Kutay, J., 2021. Change Data Capture (CDC): What it is and How it Works. Striim. URL
https://www.striim.com/blog/change-data-capture-cdc-what-it-is-and-how-it-works/ (accessed
04.11.23).
8. Real-Time Data Warehouse Examples (Real World Applications) [WWW Document], n.d. URL
https://estuary.dev/real-time-data-warehouse-examples (accessed 10.11.23).
9. Tobin, D., n.d. How to Implement Change Data Capture (CDC) [WWW Document]. Integrate.io.
URL https://www.integrate.io/blog/how-to-implement-change-data-capture-cdc/ (accessed
11.11.23).

12

You might also like