Data Warehouse: Tobiasgroup, Inc

Data Warehouse
TobiasGroup, Inc.
8536 Crow Drive
Suite 218
Macedonia, Ohio USA 44056
330.468.2468
information@tobiasgroup.com
1. Introduction ___________________________________________________________ 2
2. Data Warehousing ______________________________________________________ 3
History________________________________________________________________________ 3
The Goals of a Data Warehouse ___________________________________________________ 4
Makes an organization’s information accessible.______________________________________________ 4
Makes the organization’s information consistent. _____________________________________________ 4
Is an adaptive and resilient source of information._____________________________________________ 4
Is a secure bastion that protects the organization’s information asset. ______________________________ 4
Is the foundation for decision making.______________________________________________________ 4
Data Warehouse Information Flow ________________________________________________ 5
Data Access__________________________________________________________________________ 7
Data Cleansing _______________________________________________________________________ 7
Business Rule Application ______________________________________________________________ 7
Data Translation ______________________________________________________________________ 7
Warehouse Databases __________________________________________________________________ 7
Querying ____________________________________________________________________________ 7
Information Access ____________________________________________________________________ 8
Basic Elements of a Data Warehouse _______________________________________________ 9
Source System________________________________________________________________________ 9
Staging Area _________________________________________________________________________ 9
Presentation Area ____________________________________________________________________ 10
End User Data Access Tools____________________________________________________________ 10
Metadata ___________________________________________________________________________ 10
Basic Processes of the Data Warehouse ____________________________________________ 10
Conforming Dimensions_______________________________________________________________ 10
Extracting __________________________________________________________________________ 10
Transforming________________________________________________________________________ 10
Loading and Indexing _________________________________________________________________ 11
Quality Assurance Checking____________________________________________________________ 11
Release/Publishing ___________________________________________________________________ 11
Updating ___________________________________________________________________________ 11
Querying ___________________________________________________________________________ 11
Data Feedback/Feeding in Reverse _______________________________________________________ 11
Auditing ___________________________________________________________________________ 11
Securing ___________________________________________________________________________ 11
Backing Up and Recovering ____________________________________________________________ 12
3. Terms and Definitions __________________________________________________ 13
Data Warehousing _____________________________________________________________ 13
Common Terms _______________________________________________________________ 14
1. Introduction
At any given time the optimal IT architecture depends on a few important factors. They
include; the business requirements of the enterprise, the available technology of the time,
and the accumulated investments of the enterprise from earlier technologies.
Today, technology is moving at a record pace. IT professionals strive to meet business

requirements through existing technologies while planning and implementing new
technology into their business architecture. Competitiveness and business success or
failures have an increasing dependence on the resulting IT architecture. From a business
perspective, information management requirements fall into specific categories. Data
warehousing focuses on the operational and decisional categories.
In his book The Data Warehouse Toolkit, Dr. Ralph Kimball describes the data
warehouse as “The place where people can go to access their data .”
Simply put, a data warehouse is a central repository of information. Data warehouses

have a common infrastructure and common business definitions for data elements. They
bring together large volumes of quantitative business information obtained from
transaction processing systems, operational systems, outside sources, and sometimes
other systems designed to collect data not normally captured by the first three. The
information is cleansed and transformed so that it is complete and reliable, and is collected
and retained over time so that changes and trends can be identified.
This document recommends a lifecycle approach to building a data warehouse. The

comprehensive lifecycle approach has been developed over the past seven years. It is
based upon a combination of published best-practices to building a data warehouse and
our development and implementation experience. The lifecycle approach emphasizes a
focus on architecture and process design during the first phase of implementation. One
subject area is normally delivered; the second and subsequent phases add more subject
areas and dimensions. This reiterative approach ensures that sufficient attention is given
to each subject area added to the warehouse. It also gives the business community time
to resolve definition and conformance issues.
Although relatively new to the data warehousing market, the Microsoft™ set of tools are
recommended because of price and the relative ease of use of the tools. Skills are easily
transferable from one tool to another. The individual components of the Microsoft Data
Warehousing Framework (Data Transformation Services - DTS, Relational Database
Management System – RDBMS, and OLAP Services, and Repository are becoming the
de-facto standard tools for data warehousing.
© TOBIASGROUP – MARCH 1999 PAGE 2 OF 18 02/07/01

INFORMATION@TOBIASGROUP.COM
2. Data Warehousing
History
Data warehousing methodology has grown out of the need for immediate and
comprehensive access to enterprise information. Fast, informed business decisions are
no longer a competitive advantage, but a requirement.
In the past, information technology has focused almost entirely on OLTP (On Line
Transaction Processing) systems. OLTP or Operational Systems track our customers
and orders, process general ledger and other accounting data, tell us about inventory
levels and how much was spent on raw material last year, but do little to answer questions
requiring data from multiple operational systems. When someone asks such a question,
traditional MIS gathers information from the OLTP systems and delivers a new report
containing the answer. Hopefully, this only takes a few days. The goal of Data
Warehousing is to answer questions in a few minutes or seconds rather than days.
Operational systems are designed for fast data entry and storage, and immediate retrieval
of simple information. Usually based on a simple query - “name and address of customer
#34865”, “quantity of item AB-11 in stock”, OLTP’s do deliver sophisticated reports but
usually not in real time. To paraphrase one IT professional - “Operational reporting gives
a perfect picture of the wake of the boat, but does little to help steer a course.”
Data Warehousing is NOT an attempt to replace or redesign any operational system.

OLTP systems have been described as the “heart”of an enterprise. Most corporations
could not operate for a single day without them. Data Warehousing IS an attempt to make
information technology just as valuable as the “brain”of an enterprise.
To answer questions in seconds, new system design and implementation philosophies are
required. Data Warehouses are Information Access Systems. Data must be stored in
new ways for faster access. OLTP systems are optimized for fast entry and quick record
processing. On the other hand, data warehouses must retrieve large amounts of
information quickly and deliver it to an end user’s desktop. Business information must be
extracted from the OLTP system and loaded into the Data Warehouse’s new fast access
formats.
A Data Warehouse must then be able to deliver loaded information in a variety of formats.
Hardcopy and spreadsheet “straight line”processing is no longer sufficient. A
comprehensive repository of business rules information is also required. Business
definitions about sales periods or even the simple “what is customer?”have previously
been defined differently in different departments and facilities. “Well yes, but my number
also includes… ”or “Our period ends on … , so you can’t apply that number to these
figures”are familiar comments in today’s conference room. Redefining and consolidating
all the business rules in the warehouse eliminates these difficulties.
End users have many new tools for information access that require complex information.
Most of these new tools translate business language questions into database queries that
were previously only written in the MIS department. In addition, there are now many
applications that perform very sophisticated calculations and modeling.

The Goals of a Data Warehouse
One of the most important assets of an organization is its information. These assets are
usually kept by an organization in three forms: the operational systems of record;
distributed, ad-hoc, or departmental documents and databases utilized to satisfy reporting
requirements; and the data warehouse. While most of the ad-hoc documents, usually in
spreadsheets, will be replaced over time with the warehoused data, the data warehouse
will never be a substitute for the operational systems. The data warehouse has profoundly
different needs, clients, structures, and rhythms than the operational systems. The
operational systems of record are where data is put in, and the data warehouse is where
the data is taken out. While the basic goals of the operational systems are to capture the
daily transactions and to aid in the day-to-day running of that business, the data
warehouse:
Makes an organization’s information accessible.
The contents of the data warehouse are understandable and navigable, and the access is
characterized by fast performance. Understandable means correctly labeled and obvious.
Navigable means recognizing the destination on the screens and getting there in one click.
Fast performance means zero wait time. In addition, accessibility of the data means the
data warehouse can be the source of information for many data-hungry business
improvement efforts such as process modeling and simulation, budgeting and forecasting,
activity-based costing, and new product development.
Makes the organization’s information consistent.
Information from one part of the organization can be matched with information from
another part of the organization. If two measures of an organization have the same name,
then they must mean the same thing. Conversely, if two measures don’t mean the same
thing, then they are labeled differently. Consistent information means high quality
information. It means that all of the information is accounted for and is complete.
Is an adaptive and resilient source of information.
The data warehouse is designed for continuous change. When new questions are asked
of the data warehouse, the existing data and the technologies are not changed or
disrupted. When new data is added to the data warehouse, the existing data are not
changed or disrupted. The design of the separate data marts that make up the data
warehouse must be distributed and incremental.
Is a secure bastion that protects the organization’s information asset.
The data warehouse not only controls access to the data effectively, but gives its owners
great visibility into the uses and abuses of that data, even after it has left the data
warehouse.
Is the foundation for decision making.
The data warehouse has the right data in it to support decision-making. There is only one
true output from a data warehouse: the decisions that are made after the data warehouse
has presented its evidence.

Data Warehouse Information Flow
Over the past two decades, information technology has been adapted and integrated into
all aspects of the enterprise. Information access was never ignored, but was usually
designed by operational systems professionals. OLTP and reporting systems were
departmentalized and so diverse that information gathering was problematic. Most of
today’s information access systems look something like the following:
? ?Figure 2-1 Information Access Today

Because data needs to be gathered and translated from a variety of different sources,
reports may take more than a day to run. The 4GL reporting tools are only understood by
MIS professionals, so new reports and report modifications may take days or weeks.
Each department has its own business rules making reports hard or impossible to
correlate.

In contrast, a Data Warehouse collects data from the OLTP sources, applies a consistent
set of business rules, and stores the data in a readily accessible format separate from the
OLTP sources. Since data gathering is done once a day or month at low processing
times, updated information is always available to the end user.
Data Warehouse information access is illustrated in the following diagram.
? ?Figure 2-2 Data Warehouse Information Access
There are several important considerations in the process model above. Data generally,
but not exclusively, moves from the left to the right. In the Data Access layer, information
is gathered from the existing operational systems. Data Staging is arguably the most
important and the most complex layer. It can be divided into Data Cleansing, Business
Rule Application, and Data Translation. Information requests are compiled in the
Querying layer. Finally, data is processed, displayed, and reported in the Information
Access layer.
Process Management applications drive the warehouse and control each layer using
Metadata Functions. Metadata or “data about data”is used in every layer and defines
each process. At a minimum, Metadata includes data warehouse group, table, and field
names with descriptions; original data source; allowable values and formats; and simple
business rules. Ideally, complex business rules, cleansing information, source data
formats, and end user data formats are also included.
Data Access
The Data Access layer is the first step when loading the warehouse with new information.
Any new data since the last load must be retrieved from all existing OLTP and operational
systems. Data Access contains tools that understand all the mainframe and PC database
formats. Metadata is used to specify which data is included and where it resides in the
operational systems.
Data Cleansing
There is virtually no computer-based information that is 100% accurate. The old OLTP
saying “garbage in, garbage out”applies even more dramatically to data warehousing.
Correcting data entry errors is only a small part of the cleansing process. Typically, data
relationship problems are the most daunting. The simple question “Who is our largest
customer?”may be answered incorrectly if data is not cleansed properly. OLTP systems
in different divisions may have separate codes for the “unrelated”customers “Digital
Equipment Corporation”, “Digital”, and “DEC”. Products and services may be just as hard
to reconcile. Stephen Brown of Vality Technology, Inc. reports that 10 times the allotted
resources are usually spent implementing the cleansing layer.
Any rules for correcting inconsistencies should be stored in the Metadata repository for
reference and ease of modification.
Business Rule Application
Business Rules (as defined in Metadata) are applied in the data-staging layer. Consistent
periods and relationships are necessary to correlate departmental and divisional
information.
Data Translation
While Cleansing and Business Rule applications occur, data is translated into standard
formats and stored in the warehouse.
Warehouse Databases
The physical warehouse data store may contain one or many standardized database
formats. Optimization of information access speed is determined by varying format
selections. Data Normalization was once thought to be a rule for any database design, but
is now known to apply to OLTP-like systems only. In a warehouse, data is frequently
“denormalized”. For example, records may contain redundant and uncoded information;
additionally, summary data is stored separately or alongside detail. The many different
data formats are defined in Metadata.
Querying
Querying is the simple and hopefully speedy process of compiling warehouse information
for delivery to an end user. In the query layer, data is gathered per user request and
translated from the standardized warehouse formats into any new formats required by end
user tools. Again, information about all the different formats is stored in Metadata.

Information Access
On the end user’s desktop are the varieties of information access tools mentioned above.
Tools with built-in warehousing technology may query the warehouse directly and produce
reports. Other tools can be populated with data by applications in the Querying layer.

Basic Elements of a Data Warehouse
From the information process flow above, the following basic elements are derived and
illustrated below:
? ?Figure 2-3 Elements of a Data Warehouse
Source System
An operational system of record whose function it is to capture the transactions of the

business. A source system is often called a “legacy system”in the mainframe
environment. The main priorities of the source system are uptime and availability.
Queries against source systems are narrow, “account-based”queries that are part of the
normal transaction flow and severely restricted in their demands on the legacy system.
Staging Area
A storage area and set of processes that clean, transform, combine, de-duplicate,
household, archive, and prepare source data for use in the presentation server. In many
cases, the primary objects in this area are a set of flat-file tables representing extracted
(from the source systems) data, loading and transformation routines, and a resulting set of
tables containing clean data – Dynamic Data Store. This area does not usually provide
query and presentation services.

Presentation Area
The presentation area are the target physical machines on which the data warehouse data
is organized and stored for direct querying by end users, report writers, and other
applications. The set of presentable data, or Analytical Data Store, normally take the form
of dimensionally modeled tables when stored in a relational database, and cube files when
stored in an Olap database.
End User Data Access Tools
End user data access tools are any clients of the data warehouse. An end user access
tool can be as simple as an ad hoc query tool, or can be as complex as a sophisticated
data mining or modeling application.
Metadata
All of the information in the data warehouse environment that is not the actual data itself.
This data about data is catalogued, versioned, documented, and backed up.
Basic Processes of the Data Warehouse
Conforming Dimensions
The process of aligning business user’s understanding of the dimensions used in the data
warehouse. The resulting conformed dimensions are dimensions that mean the same
thing with every possible fact table to which it can be joined. Examples of obvious
conformed dimensions include customer, product, location, and calendar (time).
Extracting
The extract step is the first step of getting data into the data warehouse environment.
Extracting means reading and understanding the source data, and copying the parts that
are needed to the data staging area for further work.
Transforming
Once the data is extracted into the data staging area, there are many possible
transformation steps including
?? Cleaning the data by correcting misspellings, resolving domain conflicts (such as

a city name that is incompatible with a postal code), dealing with missing data
elements, and parsing into standard formats
?? Purging selected fields from the legacy data that are not useful for the data
warehouse
?? Combining data sources, by matching exactly on key values or by performing
fuzzy matches on non-key attributes, including looking up textual equivalents of
legacy system codes
?? Creating surrogate keys for each dimension record in order to avoid a
dependence on legacy defined keys, where the surrogate key generation process
enforces referential integrity between the dimension tables and the fact tables
?? Building aggregates for boosting the performance of common queries

Loading and Indexing
At the end of the transformation process, the data is in the form of load record images.
Loading in the data warehouse environment usually takes the form of replicating the
dimension tables and fact tables and presenting these tables to the bulk loading facilities
of the presentation area servers.
Quality Assurance Checking
When each presentation server is loaded, indexed, and supplied with appropriate
aggregates, the last step before publishing is the quality assurance step. Quality
assurance can be checked by running a comprehensive exception report over the entire
set of newly loaded data. All of the reporting categories must be present, and the counts
and totals must be satisfactory. All reported values must be consistent with the time series
of similar values that preceded them. The exception report is probably built with an end
user report writing facility. All issues dealing with transformation should have already been
resolved.
Release/Publishing
The user community is notified that the new data is ready – “ring the data bell.”
Updating
Incorrect data should obviously be corrected. Changes in labels, hierarchies, status, and
corporate ownership often trigger necessary changes in the original data stored in the data
warehouse. In general, these are managed load updates, not transactional updates.
Querying
Querying is a broad term that encompasses all the activities of requesting data, report
writing, complex decision support applications, requests from models, and full-fledged
data mining. Querying never takes place in the staging area.
Data Feedback/Feeding in Reverse
When modeling tools are used in a data warehousing environment, results from these
tools are sometimes loaded into the warehouse.
Auditing
At times it is critically important to know where the data came from and what were the
calculations performed.
Securing
Every data warehouse has an exquisite dilemma: the need to publish the data widely to as
many users as possible with the easiest-to-use interface, but at the same time protect the
valuable sensitive data from hackers, snoopers, and industrial spies. Data warehouse
security must be managed centrally while users must be able to access all the constituent
data with a single sign-on.

Backing Up and Recovering
The project team will decide where to take the necessary snapshots of the data for
archival purposes and disaster recovery.

3. Terms and Definitions
Data Warehousing
William H Inmon in “Building the Data Warehouse” defines a data warehouse as “a

collection of integrated, subject-oriented databases designed to supply the
information required for decision-making.”
Data warehousing then, is the process of building, creating, and maintaining a data
warehouse.
Below is a closer look at each of these components followed by a listing of the common
terms used within a data warehousing effort.
Integrated
The data warehouse is comprised of data from many systems. Each system may be
similar in nature or have a totally different use. For example, customer A may be an active
buyer of goods and services and their information is stored in an accounting system. They
may also be tracked in a marketing database used to track new construction projects.
Each of these systems store data about the same customer, but neither of them knows
anything about the other. A data warehouse captures information from both of these
systems, integrates the information so they can be related, and provides new, meaningful
ways of looking at the data.
Subject Oriented
The data warehouse takes a different approach than the traditional OLTP systems. It
looks at subjects like customers, sales, and profits as opposed to the systems that focus
on one department or process.
Databases
The term data warehouse refers to the entire collection of tools, processes and hardware
required to plan, develop, implement, and use the system. At its core is very large,
typically read-only database that collects both internal and external data and provides
unique ways of viewing the data. Internal data includes the operational system within the
organization. External data may come from customer, the government, research, and
other organizations that sell data related to your organization.
Decision-Making
Traditional operational systems are typically built on normalized relational databases that
are designed to maintain a high level of relational data integrity. Data warehouses are
denormalized in order to make the data more meaningful to the users. It is designed for
presentation and performance, allows for many different views. Product managers may
be interested in sales per region while a financial manager is interested in profitability.

The purpose of the design is to enable users with different interests to “slice and dice”the
data in ways that suit their needs. It also permits drilling through many levels from
summary to detail in order to pinpoint the cause of issues.
Common Terms
Analytical Tools
This is an umbrella phrase used to connote software that employs some sort of
mathematical algorithm(s) to analyze the data contained in the warehouse. Data Mining,
OLAP, ROLAP and other terms are used to designate types of these tools with different
functionality. In practice, however, a given analytical tool may provide more than one type
of analysis procedure and may also encompass some middleware functionality. Such
tools are difficult to classify.
Ad Hoc Query Processing
The process of extracting and reporting information from a database through the issuance
of a structured query. Programmers usually write queries using special languages that are
associated with database management systems. Most relational database managers use
a variant of 4GL (4th Generation Language originally developed by IBM). An example of
an ad hoc query might be "How many customers called the UK between the hours of 6-8
am?” Several packages available that make the construction of queries user-friendlier
than writing language constructs. These usually employ some sort of graphic/visualization
front end.
Business Intelligence (BI)
A phrase coined by (or at least popularized by) Gartner Group that covers any
computerized process used to extract and/or analyze business data.
Data Extraction, Cleansing and Transformation Process
The process by which data is extracted from an operational database, cleaned and then
transformed into a format useful for a data warehouse-based application.
Data Mall
A data mall is a collection of data marts.
Data Mart (DM)
A Data Mart is a data warehouse that is restricted to dealing with a single subject or topic.
The operational data that feeds a data mart generally comes from a single set or source of
operational data.
Data Mining
Data Mining is a process by which the computer looks for trends and patterns in the data
and flags potentially significant information. An example of a data-mining query might be
"What are the psychological factors associated with child abusers?"
Data Warehouse (DW)

A repository for data organized in a format that is suitable for ad hoc query processing,
data mining, OLAP and/or other analytical applications. Data Warehouses are built from
operational databases. The operational data is "cleaned" and transformed in such a way
that it is amenable to fast retrieval and efficient analysis. A single-purpose data
warehouse is sometimes referred to as a "data mart."
DBMS
A Database Management System manages the storage and retrieval of a collection(s) of

data. Most DBMSs associated with data warehousing are based on relational technology,
since relational databases are particularly amenable to the kinds of tasks associated with
BI. Virtually all BI systems have a large DBMS system as its foundation. (The leaders are
IBM's DB2, Oracle, Sybase, Informix and Microsoft.) However, small data warehouses or
data marts may not employ a general-purpose DBMS at all.
Decision Support System (DSS)
DSS is an umbrella expression that encompasses ad hoc query, data mining,

OLAP/ROLAP, vertical applications and, in the minds of at least some, the data
warehouse as well. DSS, appears to be falling into disuse in some circles and is being
replaced with Business Intelligence (BI)
Enterprise Data Model
Refers to a single collection of data designed to serve the diverse needs of an enterprise.
The opposing concept is that of a collection of smallish databases, each designed to
support a limited requirement.
Enterprise Data Repository
A database containing "metadata" used to control data transformations for DW/BI

systems. A leading exponent of the data repository concept is a software company called
Platinum Technology.
Enterprise Data Warehouse
A single repository holding data from several operational sources that serves many
different users, typically in different divisions or departments. An enterprise data
warehouse for a large company might, for example, contain data from several separate
divisions, and serve the needs of both those divisions and of corporate users wishing to
analyze consolidated information.
Enterprise Resource Planning (ERP)
ERP systems are comprised of software programs which tie together all of an enterprise's
various functions -- such as finance, manufacturing, sales and human resources. This
software also provides for the analysis of the data from these areas to plan production,
forecast sales and analyze quality. Today many organizations are realizing that to
maximize the value of the information stored in their ERP systems, it is necessary to
extend the ERP architectures to include more advanced reporting, analytical and decision
support capabilities. This is best accomplished through the application of data
warehousing tools and techniques.

Knowledge Discovery
A phrase coined by (or at least popularized by) Gartner Group defined as the process of
discovering meaningful new correlations, patterns and trends by sifting through large
amounts of data stored in repositories (e.g., data warehouses), using such technologies
as pattern recognition, statistics and other mathematical techniques. Knowledge
Discovery is really the same thing as data mining.
Knowledge Management
An umbrella term that is used by some in the same context as Business Intelligence.
Logical Systems Architect
A person or organization that designs the software for the DW/BI application.
Middleware
An umbrella term used to describe software that bridges various parts of a DW/DSS
system. For example, software that extracts, cleans or separates data.
Microsoft
A software vendor based in Redmond, Washington, USA.
MOLAP – Multidimensional OLAP
A set of user interfaces, applications, and proprietary database technologies that have a
strongly dimensional flavor.
Network/Traffic Pattern Analysis Application
Normally associated with telecommunications, this application analyzes traffic patterns in

order to discover facts about customer behaviors, project future demand or to conduct
market analysis to determine need for new services. The results can also be used by both
telcos and large private network operators to reduce network costs and/or to improve
network efficiency by analyzing such issues as capacity and maintenance.
OLAP/ROLAP/MOLAP
OnLine Analytical Processing/Relational OnLine Analytical Processing/MultiDimensional

OnLine Analytical Processing (OLAP/ROLAP/MOLAP) are applications that seek to verify
complex hypotheses. An example of an OLAP query might be "Compare the costs of
shipping to customers in the east to those in the west."
OLAP – OnLine Analytical Processing
The general activity of querying and presenting text and number data from data
warehouses, as well as a specifically dimensional style of querying and presenting that is
exemplified by a number of “OLAP”vendors. The OLAP vendors’technology is non-
relational and is almost always based on an explicit multidimentional cube of data. OLAP
databases are also known as multidimensional databases, or MDDBs.

Operational Data Store (ODS)
A database of operational data that is formatted as it is collected for use as a data

warehouse. This is opposed to the more common process of creating a separate
database out of existing operational databases designed expressly for use by the data
warehouse.
Operational or OLTP Database
Operational data is the data collected from operations such as order processing,
accounting, manufacturing, marketing, etc. Most modern companies collect most of this
data using a form of OnLine Transaction Processing (OLTP). Data generated by these
systems is generally not in a format that makes for efficient query processing or analysis.
Relational Data
Data that has been formatted and organized to work in a database designed to work using
relational schema.
ROLAP – Relational OLAP
A set of user interfaces and applications that give a relational database a dimensional
flavor.


Data Warehouse: Tobiasgroup, Inc

Uploaded by

Copyright:

Available Formats

You might also like

Data Warehouse: Tobiasgroup, Inc

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Warehouse: Tobiasgroup, Inc

Uploaded by

Copyright:

Available Formats

Data Warehouse

Today, technology is moving at a record pace. IT professionals strive to meet business

Simply put, a data warehouse is a central repository of information. Data warehouses

This document recommends a lifecycle approach to building a data warehouse. The

© TOBIASGROUP – MARCH 1999 PAGE 2 OF 18 02/07/01

Data Warehousing is NOT an attempt to replace or redesign any operational system.

© TOBIASGROUP – MARCH 1999 PAGE 3 OF 18 02/07/01

Makes an organization’s information accessible.

Makes the organization’s information consistent.

Is an adaptive and resilient source of information.

Is a secure bastion that protects the organization’s information asset.

Is the foundation for decision making.

© TOBIASGROUP – MARCH 1999 PAGE 4 OF 18 02/07/01

? ?Figure 2-1 Information Access Today

© TOBIASGROUP – MARCH 1999 PAGE 5 OF 18 02/07/01

Data Warehouse information access is illustrated in the following diagram.

? ?Figure 2-2 Data Warehouse Information Access

Business Rule Application

© TOBIASGROUP – MARCH 1999 PAGE 7 OF 18 02/07/01

© TOBIASGROUP – MARCH 1999 PAGE 8 OF 18 02/07/01

? ?Figure 2-3 Elements of a Data Warehouse

An operational system of record whose function it is to capture the transactions of the

© TOBIASGROUP – MARCH 1999 PAGE 9 OF 18 02/07/01

End User Data Access Tools

Basic Processes of the Data Warehouse

?? Cleaning the data by correcting misspellings, resolving domain conflicts (such as

© TOBIASGROUP – MARCH 1999 PAGE 10 OF 18 02/07/01

Quality Assurance Checking

Data Feedback/Feeding in Reverse

© TOBIASGROUP – MARCH 1999 PAGE 11 OF 18 02/07/01

© TOBIASGROUP – MARCH 1999 PAGE 12 OF 18 02/07/01

William H Inmon in “Building the Data Warehouse” defines a data warehouse as “a

© TOBIASGROUP – MARCH 1999 PAGE 13 OF 18 02/07/01

Ad Hoc Query Processing

Business Intelligence (BI)

Data Extraction, Cleansing and Transformation Process

A data mall is a collection of data marts.

Data Mart (DM)

Data Warehouse (DW)

A Database Management System manages the storage and retrieval of a collection(s) of

Decision Support System (DSS)

DSS is an umbrella expression that encompasses ad hoc query, data mining,

Enterprise Data Model

Enterprise Data Repository

A database containing "metadata" used to control data transformations for DW/BI

Enterprise Data Warehouse

Enterprise Resource Planning (ERP)

© TOBIASGROUP – MARCH 1999 PAGE 15 OF 18 02/07/01

Logical Systems Architect

A software vendor based in Redmond, Washington, USA.

MOLAP – Multidimensional OLAP

Network/Traffic Pattern Analysis Application

Normally associated with telecommunications, this application analyzes traffic patterns in

OnLine Analytical Processing/Relational OnLine Analytical Processing/MultiDimensional

OLAP – OnLine Analytical Processing

© TOBIASGROUP – MARCH 1999 PAGE 16 OF 18 02/07/01

A database of operational data that is formatted as it is collected for use as a data