Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

Data Warehouse (DW) - Characteristics, Architecture and

Principles

What is a Data Warehouse (DW)?

In computing, a data warehouse (DW) is often considered a business intelligence system used for
data analysis and for the day-to-day reporting needs of a business unit.

A data warehouse is a central repository of integrated data from one or more unequal sources.
The data warehouse contains all the data related to the business. Information systems can be
categorized in many aspects that can be long or limited. These include classification by purpose
of such systems in enterprises. The focus here will be on both operational information systems
and decision support systems. Operational information systems are concerned with the
management of the day-to-day operations of organizations and are the cornerstone of modern
enterprises.

Examples of such systems are Accounting systems, hospital patient management systems,
application management systems on a commercial website and student registration systems at
colleges and institutes. All these systems are concerned with the day-to-day operations of
enterprises and indispensable in modern institutions.

Obviously, operational information systems store and process a large amount of data every day,
and that data increases overtime to reach in some institutions to millions of gigabytes and
terabytes. Sometimes such data exist in insurance companies, universities, hospitals, and public
government sectors. Such data is a real asset and is invaluable if it is exploited by enterprises to
study and analyze and then presented to the decision-makers in the establishments in a way that
enables them to identify the performance of these facilities and formulate long-term strategies for
them. Hence, there was a need only for Decision Support Systems (DSS). One of the most
important applications of these systems, which is the main pillar of these systems, is Data
Warehousing.
Characteristics of Data warehouse

Data warehouses are systems that are concerned with studying, analyzing and presenting
enterprise data in a way that enables senior management to make decisions.

The data warehouses have some characteristics that distinguish them from any other data and
these characteristics are as follows:

 Subject-Oriented: For example, we define subjects in a medical facility as doctors,


nurses, medicines, diseases, and so on. They are sensual and moral things.
 Integrated: The relationship between the data and the way in which the data is extracted
and modified is the same regardless of the original source.
 None-Volatile: None-Volatile: The data warehouse is not editable when loaded in
repositories and not updated in real-time. This data is updated periodically by data
uploading, protecting it from the effect of instant change and only used for analysis,
study, and presentation.
 Time-Variant: This is the most important point that this data is related to the time factor
and organized via time periods (weekly, monthly, yearly, etc.), for example, Student
Registration Data at the university for over forty years.

5 Steps to Data Warehousing

Building a data warehouse in an organization is not an easy task, and requires a professional
team in business information systems analysis; it is also important to have professional technical
skills in that team.

Now, we will present the technical steps to build data warehouses, far from the analysis and
design phase that is very close to the business view. These steps are arranged as follows:

1. Creating Data Preparation Area: a database with very high storage capacity; It stores all
data coming from different Operational Systems so that data is purified and modified before
being loaded into the data warehouse. At this stage, the design of the database should be
compatible with the design of the data warehouse.

2. Creating Data Warehouse: A data warehouse is created where data will be uploaded after it
is extracted and purified. Data warehouses are always designed to allow relationships of different
dimensions, such as the relationship between the admissions of students to a specialization in a
given year.

3. Data Marts: Some large organizations divide the data warehouse into a set of data marts.
There is a Data Mart for financial management data and another for human resources
management or division based on the branches of the institution.

4. Extract, Transform, and Load (ETL) Process: At this stage, we extract data from different
sources to the Data Preparation Area and we convert it from one image to another if required.
We also often integrate some data with each other or define new data that did not exist before, in
addition to clearing the wrong data and deleting unimportant ones. At the stage of Loading Data,
The data is loaded from the Data Preparation Area to Data Warehouse. During this phase, the
data is tested.

5. Creating Decision Support Systems (DSS) Applications: At this stage, implement DSS
applications for displaying and analyzing the data warehouse, called Online Analytical
Processing (OLAP) Systems. These applications display data in several dimensions and use
complex algorithms to analyze data. At this stage, data mining tools are also used to recognize
patterns in the data sets and analyze big data.

Three-Tier Data Warehouse Architecture

Most organizations adopt Three-Tier Architecture in the work of data warehouses.

In this Architecture, the data warehouse system is divided into three tiers (levels); Bottom Tier,
Middle Tier, and Top-Tier.

1. Bottom Tier: It is the data warehouse database server and an RDBMS (Relational Database
Management System). This may include many specific data marts and metadata repositories.
You can use back end tools and application program interfaces called a gateway to feed data at
the bottom level. The gateways and back end tools perform the extraction, cleaning, loading,
and updating functions. Examples of gateways contain Open-Linking and Embedding for
Databases (OLE-DB), Open Database Connection (ODBC) and Java Database Connection
(JDBC).

2. Middle Tier: A middle-tier consists of an online analytical processing (OLAP) server which
can be executed for a quick query of the data warehouse. The OLAP server is implemented in the
following ways:

I. Using the ROLAP (Relational online analytical processing) model, i.e., an extended relational
database management system (DBMS) that assigns tasks to multidimensional data for standard
relational operations.

II. Using the MOLAP (multidimensional online analytics processing) model, ie a special-
purpose server that directly implements multidimensional databases and operations.

3. Top-Tier: A top-tier consists of front-end tools that contain reporting tools analysis tools and
query tools to display the results provided by online analytical processing systems as well as
additional tools for data mining.

Principles of Data Warehousing

There are some data warehouse principles that may include:

Load Processing

In a data warehouse, several phases and steps must be taken to load new data and process it
including filtering, reformatting, data conversion, indexing, and metadata updates.

Load Performance

Data warehouses need to increase the loading of new data from time to time within narrow time
windows. Load performance should be measured in gigabytes per hour and in hundreds of
millions of rows and should not artificially disrupt the data business volume.

Data Quality Management


The data warehouse verifies contextual integrity, local stability, and global consistency despite
"dirty" sources and large-scale database size. This is only useful and valuable to the extent if
business stakeholders trust the data and its resources as a fact-based management system requires
the highest quality of the data.

Strategic Adaptability

Adaptability is critical to the development of business requirements. The business intelligence


tools available in the market must be taken into account to adapt to often unexpected changes in
business demands. It is also very important to keep in mind the pace at which BI tools evolve to
include additional functions and features. In data warehouses, adaptability requires a principle
and method to use alternative BI tools in the future such as various back-end or visualization
tools.

Query Performance

Reliable database management and the fact-based direction should not be decelerated by the
performance of RDBMS (relational database management system). Massive and complex
queries should be completed in seconds, not hours or days.

Terabyte Scalability

A Terabyte contains more than a trillion bytes of digital information - 1015 bytes. Today, the
size of the data warehouses is evolving at staggering rates. This ranges from a few bytes to
hundreds of terabytes and gigabytes sized data warehouses. So, scalability should be taken into
account in measurement.

You might also like