Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

This article was downloaded by: [141.214.17.

222]
On: 25 October 2014, At: 19:40
Publisher: Taylor & Francis
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,
37-41 Mortimer Street, London W1T 3JH, UK

Information Systems Management


Publication details, including instructions for authors and subscription information:
http://www.tandfonline.com/loi/uism20

Data Warehouse Administration and Management


a b c d
Alan Benander , Barbara Benander , Adam Fadlalla & Gregory James
a
An associate professor of Computer and Information Science at Cleveland State University.
b
An associate professor of Computer and Information Science at Cleveland State University.
c
An associate professor at Cleveland State University in the Computer and Information
Science Department.
d
Manager of Knowledge Discovery Services for National City Corp. and is pursuing his D.B.A.
degree at Cleveland State University.
Published online: 21 Dec 2006.

To cite this article: Alan Benander , Barbara Benander , Adam Fadlalla & Gregory James (2000) Data
Warehouse Administration and Management, Information Systems Management, 17:1, 71-80, DOI:
10.1201/1078/43190.17.1.20000101/31217.10

To link to this article: http://dx.doi.org/10.1201/1078/43190.17.1.20000101/31217.10

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained
in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no
representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the
Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and
are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and
should be independently verified with primary sources of information. Taylor and Francis shall not be liable for
any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever
or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of
the Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematic
reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any
form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://
www.tandfonline.com/page/terms-and-conditions
DATA
WEB-BASED
WAREHOUSING
LEARNING

DATA WAREHOUSE
ADMINISTRATION
AND MANAGEMENT
Alan Benander, Barbara Benander, Adam Fadlalla, and Gregory James
Downloaded by [141.214.17.222] at 19:40 25 October 2014

Data warehouses are huge repositories of legacy data. They are usually con-
figured as stars or snowflakes, and their application strength is their ability to
associate pieces of data in unique and multiple ways. Obviously, they are very
different from traditional relational databases, and as such they require admin-
istrators that possess a skill set different from that of traditional database
administrators. This article explores the skills a data warehouse administrator
must possess.

DATA WAREHOUSE IS A REPOSI- expertise that go beyond those required of a


A tory of integrated information, culled
from any number and variety of data
traditional database administrator (DBA).
There is a need for the creation of a position
sources, including various databases and that is designated a “data warehouse adminis-
legacy data sources. The size of a data ware- trator” (DWA). The tasks of a DWA encom-
house is usually massive and the data ware- pass those of a traditional DBA, but the DWA’s
h o u s e t y p i c a l l y s t o re s a w i d e r a n g e o f job is considerably more complex because of
information that has been generated over long the nature of the data warehouse and its posi-
periods of time. Data related to business sub- tion within an enterprise data architecture.
ALAN BENANDER is an
jects such as products, markets, and customers
associate professor of
Computer and Informa- are all collected, integrated, and housed under
DATA WAREHOUSE DATABASES VS.
tion Science at Cleveland the data warehouse umbrella. When this vast
State University. wealth of information is interfaced to decision OPERATIONAL DATABASES
BARBARA BENANDER support tools that offer powerful data access In order to better understand the distinction
is an associate professor between the roles of a DBA and a DWA, it is
and analysis capabilities, the data warehouse
of Computer and Infor-
can be fully exploited by its users. imperative to understand the significant differ-
mation Science at Cleve-
land State University. There are a number of data warehousing ences between an operational database and a
ADAM FADLALLA is tools and products currently on the market. All data warehouse. They differ not only in their
an associate professor at of these products provide relatively easy access ultimate purposes, but also in the types and
Cleveland State Univer- amount of data stored in them and the meth-
to the data warehouse, both at the enterprise
sity in the Computer and
and the data mart level. All of these tools also ods by which they are populated, updated, and
Information Science
Department. provide useful facilities for administering and accessed. Some distinguishing characteristics
GREGORY JAMES is managing a data warehouse. However, creation, are summarized in Exhibit 1.
manager of Knowledge maintenance, and daily administration of data There are three main types of data ware-
Discovery Services for
warehouses are still formidable tasks that are far house: operational data stores (ODS), enterprise
National City Corp. and
is pursuing his D.B.A. from being fully automated. data warehouses (EDW), and data marts (DM).
degree at Cleveland State The successful administration and manage- Operational data stores are the latest evolution
University. ment of a data warehouse requires skills and of customer information files (CIF). CIFs were
© 2 0 0 0 C R C P R E S S L L C I N F O R M A T I O N S Y S T E M S M A N A G E M E N T
W I N T E R 2 0 0 0
DATA WAREHOUSING

EXHIBIT 1 Operational Databases vs. Warehouse Databases

Characteristic Operational Databases Warehouse Databases


Users All users Executives, analysts, customer service
representatives
Goal Recordkeeping (OLTP) Information analysis (OLAP, etc.),
decision support, database marketing
Update Online Batch
Query level Atomic, detailed Aggregated, summarized, integrated
Time horizon Mainly present Historical + present + future
Data source Internal Internal + external
Orientation Entity oriented (product, Category oriented (Product type,
account, customer …) account type, customer segment …)
Data volumes Gigabytes Gigabytes/terabytes
Process Transaction driven Analysis driven
Structure Relatively static Dynamic
Downloaded by [141.214.17.222] at 19:40 25 October 2014

widely deployed in the past to maintain all com- much smaller than that of an enterprise data
mon customer information in one central loca- warehouse. Its purpose is to provide decision
tion. Updates to this information are entered support for a particular department within an
into the CIF, and then all other systems that enterprise or about a particular subject within
share this data are synchronized to it. Name, an enterprise. Many companies decide to first
address, and phone number are good examples implement individual data marts before com-
of shared customer information. Today’s more mitting to a full-scale EDW. It has been esti-
sophisticated ODS performs the same basic mated that a data mart typically requires an
function as the CIF but is also actively used by order of magnitude (1/10) less effort than that
customer support and sales personnel. It is the of an EDW. In addition, it takes months, as
nexus for all marketing and sales activities, and opposed to years, to build, and costs tens or
it contains information pertaining to current, hundreds of thousands of dollars, versus the
prospective, and past customers. In addition to millions of dollars needed for an EDW.7
the basic customer information, the ODS also All three types of data warehouses represent
tracks all recent customer activity and market- data in fundamentally different ways than do
ing interactions with the customer. ODSs are operational databases. In an operational data-
commonly deployed by corporations reorganiz- base that has been designed using traditional
ing away from product-line-oriented structures relational database design principles, data is
to customer-oriented structures represented as tables, whereas in a data ware-
Enterprise data warehouses are distinguished house, there are many instances where data is
from operational data stores by the scope and most naturally represented as cross-tabulations.
quantity of the data they contain. They typically As an example, Exhibit 2 shows how sales data
contain product, accounting, and organizational might be represented using a relational data-
information as well as customer information. base design for an operational database. Exhibit
Where an ODS may typically contain 12 to 18 3 shows a two dimensional representation of
months of data, an EDW will store three to five the same data. The two dimensions are Product
years of data or more. Many EDWs are not (P1, P2, P3, P4) and Region (E, C, W). Adding
accessed directly by end users because of their a third dimension, such as Time (e.g., 1999,
size and complexity. Their role is to store the 2000), upgrades the two-dimensional represen-
integrated, scrubbed, historical information that tation to a cube, where a cell of the cube repre-
is used as the standard data repository from sents “sales of a specific product, in a specific
which multiple, special purpose data warehouses region, for a specific year.”
(i.e., data marts) are populated. These secondary Differences between operational databases
data warehouses are then accessed by end users and data warehouse databases also manifest
with decision support and ad hoc query tools. themselves sharply in their physical designs. An
A data mart is a subject-oriented or depart- important goal for an operational database is to
ment-oriented data warehouse whose scope is provide short update and query transactions.
© 2 0 0 0 C R C P R E S S L L C I N F O R M A T I O N S Y S T E M S M A N A G E M E N T
W I N T E R 2 0 0 0
DATA WAREHOUSING

EXHIBIT 2 A Sample Relational Sales Database

Product Region Sales


P1 E 20
P1 C 30
P1 W 10
P2 E 30
P2 C 25
P2 W 15
P3 E 60
P3 C 50
P3 W 40
P4 E 10
P4 C 40
P4 W 20
Downloaded by [141.214.17.222] at 19:40 25 October 2014

The physical structure of an operational data- may require large system conversions or
base is designed to ensure update efficiency enhancements to be implemented. All of these
and data integrity over small sets of related changes require specialized tools and proce-
data. On the other hand, data warehouses are dures to facilitate version control, monitoring,
typically used to answer analytic questions in a and system performance tuning.
user-friendly, ad hoc query environment. The Many data warehouses are designed using a
physical structure of a data warehouse is star schema model. A star schema includes a
designed to provide data loading and storage central “fact” table surrounded by several
efficiency and fast ad hoc query response times. “dimension” tables. The fact table contains a
Special indexing techniques (for example, bit- large number of rows that correspond to
map indexes) and physically partitioned sche- observed business events or facts. The dimen-
mas are commonly used in data warehouse sion tables contain classification and aggrega-
implementations. tion information about the central fact rows.
Because of its enormous size, as well as its The dimension tables have a one-to-many rela-
inherent nature, a data warehouse requires very tionship with rows in the central fact table. The
close monitoring in order to maintain accept- star schema design provides extremely fast
able efficiency and productivity. As more and query response time, simplicity, and ease of
more users access the warehouse, the ware- maintenance for read-only database structures.
house grows both structurally and in size, caus- However, star schemas are not well suited for
ing potential system performance problems. online update operations. Specialized data
The data structure may need to change over warehousing tools have been built to explicitly
time as additional data is kept for analysis. support star schema architectures. The more
Some physical structures may grow dramati- recent versions of general-purpose relational
cally as the historical content increases. systems such as the IBM DB/2, Oracle, and
Unusual business events such as mergers and Microsoft’s SQL Server now support these
acquisitions and new product introductions structures as well.

EXHIBIT 3 A Possible Data Warehouse Version of the


Database of Exhibit 2

E C W
P1 20 30 10
P2 30 25 15
P3 60 50 40
P4 10 40 20

© 2 0 0 0 C R C P R E S S L L C I N F O R M A T I O N S Y S T E M S M A N A G E M E N T
W I N T E R 2 0 0 0
DATA WAREHOUSING

It is evident that a DBA who possesses would experience. In fact, a recent Gartner
knowledge and skills pertinent to only a tradi- Group management report stated the follow-
tional operational database will not be able to ing: “It is not unusual for organizations’ ware-
administer and manage a data warehouse. A houses to experience a 100 percent growth-rate
person knowledgeable of all the various com- per year. This is not the typical scenario that
plexities of a data warehouse (i.e., a DWA), is DBAs take into account when sizing databases

A DWA
needed. for production. Compound this with the fact
that adding even incremental amounts of data
must pay DATA WAREHOUSE ADMINISTRATION
increases the usefulness of the overall ware-
house. Therefore, moderately small additions
special AND MANAGEMENT TASKS
may result in a multiplier effect for queries. In
attention to Traditional database administration methods the end, what one might experience is a drastic
and techniques for operational database applica-
high-level growth in not only the size of the warehouse
tions performed by the DBA have been devel- but also the usage, and necessarily its ongoing
business oped and refined over 20 years of industrywide administration (security, tuning, etc.).”
processes and use. Robust tools exist to aid in all aspects of the Clearly such future expansion must be taken
job of a DBA. Data warehousing technology, on
to their effects into account by the DWA when planning a
the other hand, has only recently moved beyond data warehouse. He or she should consider
upon the its initial stages, where most efforts are the first using system architectures to support such vol-
database of their kind. Database management systems
Downloaded by [141.214.17.222] at 19:40 25 October 2014

umes of data; these include SMP (Symmetric


have been enhanced significantly to enable the
structure. Processing) and MPP (Massive Parallel Process-
creation of large-scale data warehouse databases, ing) platforms, which introduce new levels of
and physical data storage systems have been design complexity (parallelism of database
enhanced to handle the extremely large capaci- operations and storage). DB2 Parallel, Informix
ties as well. Specialized data warehouse adminis- XPS, and Oracle Parallel offer versions that sup-
tration tools and techniques are also being port parallel operations.8
developed to handle the unique system require-
Also as part of the planning stage, the DWA
ments of data warehouses.
must be involved with data-related issues that
All of the usual operational database admin-
pertain to data inconsistencies and data seman-
istration tasks apply to a data warehouse envi-
tics. Source data inconsistencies pose potential
ronment. However, because of the differences
serious problems to the integrity and function-
between an operational database and a data
ality of the data warehouse. For example, even
warehouse previously mentioned, many of these
though a particular data field is not essential for
tasks are performed differently, focus on differ-
all business units, it must be entered properly
ent objectives, or are more complicated. For
by all business units in order for the integrated
example, whereas a DBA is generally concerned
views within the warehouse to be accurate. As
with small, frequent, online updates to the data-
an example, suppose that manufacturing batch
base, the DWA considers large, less frequent,
numbers are not important to the sales depart-
batch updates to the database. It is critical to
ment, but are extremely important for manu-
the success of a data warehouse that large, batch
facturing and customer service to be able to
updates are accomplished as quickly as possible
track down all customers using a particular
in order to make the refreshed data warehouse
batch of a product. If the sales department
available as soon as possible. In many situations,
does not enter the batch numbers, or enters
the currency and availability of the data ware-
them inconsistently in their system, then track-
house provides a significant competitive advan-
ing usage from sales back through the manufac-
tage. For example, a business’s ability to react
turing process might be impossible.
quickly to a sudden shift in consumer behavior
DWAs also need to deal with issues regard-
may result in significant customer retention or
ing data semantics. Data and its meaning
in new customer acquisition. Frequent updates
evolve over time. What was a preferred cus-
to the data warehouse may be needed to sup-
tomer last year may not be a preferred customer
port the decision-making cycle, which may be
this year by virtue of a change in business pol-
monthly, weekly, or even daily.
icy. Consequently, the DWA must pay special
attention to high-level business processes and
Planning to their effects upon the database structure
A DWA must plan for a potentially enormous (process-enforced integrity vs. structure-
growth rate of the data warehouse, anticipating enforced integrity). Processes that manage and
growth rates that are much higher than a DBA maintain data in all of the source systems need
© 2 0 0 0 C R C P R E S S L L C I N F O R M A T I O N S Y S T E M S M A N A G E M E N T
W I N T E R 2 0 0 0
DATA WAREHOUSING

to be examined when they change to ensure data warehouse uses the business subject–ori-
that the meaning of the data has not also ented approach.
changed. This could affect the fundamental Data warehouses move beyond data process-
meaning of the data (as it is defined by its ing to decision-support and knowledge-based
source) or its integrated meaning (as it is applications. In this mission, the semantic
defined within the warehouse where many complexity of the underlying database structure

D ata
more relationships are maintained).
Another difference in the challenges facing
becomes even more complex. This additional
information requirement often takes the form
warehouses the DWA as opposed to a DBA is in the area of of more complex relationships among base
move beyond planning ahead for storage management. tables, which in turn require more database
Whereas a typical corporate database might keys and indexes. The need to index candidate
data consume several gigabytes of storage, a data keys is associated with large tables of the kind
processing to warehouse may require several terabytes. that pervade data warehouses. Indeed, multidi-
decision- Because of the warehouse’s enormous size, par- mensional databases, used to deploy data ware-
allel processing is the norm when dealing with houses, often require more space to store their
support and a data warehouse. To achieve acceptable access indexes than they do for their main fact tables.
knowledge- rates and minimize data loss, the data in a data Some warehouse designers implement indexes
based warehouse generally is distributed over multiple on columns with large domains as a substitute
disks, with concurrent access. Because of these for commonly executed select operations that
applications.
Downloaded by [141.214.17.222] at 19:40 25 October 2014

and other considerations, the traditional stor- are known a priori to produce small result sets.
age management tasks of a DBA are signifi- In such situations, the overhead of building and
cantly expanded in a data warehouse maintaining the index is smaller than executing
environment. the same “where” clause over and over.
Another issue with which a DWA must be The design of a data warehouse is intended
concerned is the inclusion of potentially large to provide integrated views of business objects
volumes of external data. The data in a data such as “Customer.” The working set of data
warehouse is an information asset that may be values that define “Customer” may not all
augmented, thereby appreciating in its overall arrive at the warehouse at one time. Therefore,
value. This is often accomplished by adding integrity constraints that might be assumed by
external data acquired from outside organiza- the user of the warehouse might not be practi-
tions to enrich the data warehouse. Census cally enforced during update. The DWA must
information, credit reports, purchasing habits, be sensitive to the temporal aspects of the data
demographics, and more are all available for warehouse. For example, with time, the mean-
purchase. When external data is added to a ing of classifications can change. For example,
data warehouse, organizations can begin using the classification “Middle Income” in 1979 vs.
the augmented data related not only to their “Middle income” in 1999 is probably different.
existing customers, but also potential new cus- But an important issue is whether the 1979 or
tomers that fall within their marketing profiles. the 1999 data is stored in historical records. If
This is an important capability for firms wish- the original data used to derive the classifica-
ing to expand their market share. Thus, to truly tions is no longer available, then there is no
leverage the power of a data warehouse, the ability to adjust it in the future.
DWA must plan for the future inclusion of Data warehouse designs must be robust to
valuable external data. deal with incompleteness and inconsistency,
unlike OLTP and operational database designs.
The potential for null values, has a significant
Design impact upon query strategies and adds complex-
The design goals for a data warehouse differ ity to the warehouse architecture. Does one
from those of an operational database. The impose restrictions on the warehouse and disal-
major design goal for an operational database is low nulls, potentially excluding a significant por-
to meet business operational data require- tion of useful, albeit incomplete, data? Or does
ments, while the major design goal for a data one allow null values within the warehouse and
warehouse is to meet business analysis informa- deal with the potential incompleteness on the
tional requirements. Consequently, the DWA query side? As the number of data sources grows
cannot simply rely on conventional database (e.g., multiple operational databases that supply
design techniques such as E-R diagrams, trans- data to the warehouse), the potential for “disin-
action processing constraints, database normal- tegration” increases. This permits a situation in
ization, and the like. A design approach for the which the total working set of attributes in the
© 2 0 0 0 C R C P R E S S L L C I N F O R M A T I O N S Y S T E M S M A N A G E M E N T
W I N T E R 2 0 0 0
DATA WAREHOUSING

integrated views within the warehouse are not business function (e.g., marketing, sales,
totally populated. Missing values may arise finance) that will use the data warehouse and
through historical discrepancies (the data was build the data model one function at a time.
not collected during that time period), through With this approach, the DWA simply creates
incomplete view integration (inconsistent can- data marts for each department. When com-
didate keys), or nonmandatory values (fields pleted, the enterprise data warehouse will com-

M DDs give
that are important for one application may not
be important for another application).
prise the integration of all the data marts.
Those who argue against the incremental
flexible access Some argue that the RDBMS design approach cite the inherent problems involved
to large approach can fail to deliver adequate perfor- in combining several data marts into a single
mance as a result the massive size and com- data warehouse. The integration of these data
amounts of plexity of the data warehouse. For example, marts, they maintain, is more difficult than ini-
data and do it typical queries against the data warehouse are tially creating a single enterprise data ware-
quickly by complex and ad hoc, often submitted by high- house. The authors contend, however, that the
level managers in the organization. Answering complexity and cost involved with the enter-
“pre-digesting” highly complex business analysis queries may prise approach is not worth the risk. With the
data. require a large number of joins of huge tables. vast array of new technologies and modeling
This creates huge, temporary tables and is very approaches, it is a judicious approach for a
time-consuming. Expensive hardware solutions DWA to first become expert at creating a data
Downloaded by [141.214.17.222] at 19:40 25 October 2014

do exist, including parallel database servers. mart before attempting to create the enterprise
Furthermore, the complexity of the SQL state- data warehouse.
ments would likely be an impediment to the Implementing a data warehouse requires
nonsophisticated casual query writer. To solve loading of data, which is a far more complex task
this problem, the DWA could create canned, for a data warehouse, as compared with a con-
optimized queries. This solution, however, is ventional database. The first major challenge for
time-consuming for the DWA and does not the DWA is locating all data sources for the data
offer maximum flexibility, in terms of range of warehouse. As mentioned previously, a data
available queries, to the users. warehouse is created by using a variety of data
An alternative design model for a data ware- sources as input. These can include relational
house is the multidimensional database databases, hierarchical databases, network data-
(MDD). In the MDD model, data is stored in a bases, conventional files, flat files, news wires,
multidimensional array (hypercube) to allow HTML documents, and legacy systems. The
users to access, aggregate, analyze, and view tasks of identifying and locating all sources are
large amounts of data quickly. MDDs give flex- the responsibility of the DWA. With frequent
ible access to large amounts of data and do it corporate downsizing and the high turnover rate
quickly by “pre-digesting” data. Several prod- in some businesses, it may prove quite a chal-
ucts exist for creating and accessing MDDs. lenge for the DWA to identify and locate these
These include eSSbase, Express, and LightShip important inputs to the data warehouse.
Server. MDDs usually keep the data warehouse After locating all of the data sources for the
size to a minimum on the server through data data warehouse, the DWA must ascertain the
compression and provide client front-end tools. various formats in which the data to be input-
Clearly, the DWA needs to be familiar with the ted to the data warehouse is to be stored. The
various new technologies for implementing a heterogeneous nature of the data, originating
data warehouse. from multiple production and legacy systems,
as well as from external sources, creates another
challenge for the DWA. All of the data to be
Implementation stored in the data warehouse must be reformat-
Once the DWA has planned and designed the ted to conform to the chosen data model of the
data warehouse, he or she must choose an data warehouse. A number of management
implementation approach. It is the authors’ tools are available to assist in this task, includ-
opinion that the DWA should use an incremen- ing those from such vendors as Carleton, Plati-
tal development approach, rather than an num Technology, and Prism Solutions.
enterprise approach, in creating a successful Data quality is a much greater challenge in a
data warehouse. The size and complexity of an data warehouse setting than in an operational
enterprise data warehouse makes an incremen- database setting. The authors’ experiences have
tal approach the preferred strategy. For exam- shown that nearly 20 percent of the data is
ple, one successful strategy is to approach each potentially inconsistent for one reason or
© 2 0 0 0 C R C P R E S S L L C I N F O R M A T I O N S Y S T E M S M A N A G E M E N T
W I N T E R 2 0 0 0
DATA WAREHOUSING

another. For example, it may happen that policies than the DBA counterpart. The DWA
address changes are not captured, records will be in frequent communication with the
belonging to the same customer are not merged business managers and must understand clearly
properly within the warehouse because of incon- their needs and requirements.
sistent candidate keys, or old errors in nonessen- Many standard data warehouse queries spec-
tial fields are replicated into the warehouse ify aggregation values, and many of these aggre-
A DWA
without notice. The data quality problem
becomes evident on the query side when incon-
gations are requested over and over again. In
order to minimize the number of full-table
must be much sistent results are returned. One particular exam- scans, the DWA should look for often-used
ple occurred at a company that queried its data aggregates and store them as permanent values
more familiar
warehouse for duplicate Social Security num- in aggregate tables within the warehouse. These
with the bers. The system returned one SSN several hun- tables are refreshed at the same time the ware-
business rules dred times. Upon investigation, it was found house is updated, which requires more process-
that a particular business office had entered the ing during load but, over the long term,
and policies
SSN of their branch manager, instead of the cus- eliminates redundant scans to produce the
than the DBA tomers’, for all special accounts because a favor- same aggregate values even though the base
counterpart. ite production report was sorted by customer tables have not changed.
SSN rather than by manager!
There are many other data quality chal-
Maintenance
Downloaded by [141.214.17.222] at 19:40 25 October 2014

lenges inherent to heterogeneous data. For


example, different production systems may rep- In traditional database administration, the
resent names differently, use different codes DBA is responsible for ensuring the security
that have the same meaning, and may store the and integrity of the database, defining user
same attribute in different formats (e.g., many views, and setting access permissions to the
formats exist for storing the date field). database. The responsibilities of a DBA also
Another issue that the DWA must address is include implementing policies and procedures
the possibility of changes in the information related to database security. For a data ware-
source. Detection of changes in the information house, security issues are even more critical. For
source can vary in importance, depending on example, if a business failure occurred in some
the nature of the data warehouse. For example, aspect, many queries would need to be
when it is not important for the warehouse to answered, such as “What was the major con-
be current and it is acceptable for the data ware- tributing factor?” Who should be allowed to
house to be off-line occasionally, then ignoring ask such queries and obtain the answers? The
change detection may be acceptable. However, very survival of the business could well depend
if currency, efficiency, and continuous access are on proper security in the data warehouse, pro-
required, then these changes must be propa- vided for by the DWA.
gated up to the data warehouse. How does the Some organizations are sensitive to internal
DWA detect changes in the input data sources access. For example, at a large healthcare orga-
and propagate them up to the data warehouse? nization, various medical departments were
It depends on the nature of the data source. For concerned about physicians from other depart-
example, if a source has active database capabil- ments viewing specialized test results, making
ities (such as triggers), then change notification assumptions about the results’ meaning, and
can be programmed automatically. If a data coming to erroneous conclusions. There are
source maintains a log that can be inspected, many situations where the data might imply
then inspection of the log can reveal changes. one thing but in reality means something quite
Detection of change in input sources and prop- different. In these circumstances, knowledge
agation to the data warehouse is simply an addi- about the data and how it came into existence
tional function of the DWA, a task not normally is not captured within the warehouse. Special
required of a traditional DBA. security measures put in place by the DWA are
Another traditional database task that is required to mitigate these risks.
expanded upon for the DWA is the creation of The “knowledge about the data,” or meta-
various views that are appropriate for the vari- data, is important information that must be
ety of users. Whereas the DBA created opera- administered by the DWA. Metadata, which is
tionally oriented views, the DWA must create the syntactic, semantic, and pragmatic docu-
views for managers that are oriented toward mentation of the data in the data warehouse, is
business decision making. The DWA must be crucial for the success of a data warehouse. The
much more familiar with the business rules and DWA must be familiar with the different
© 2 0 0 0 C R C P R E S S L L C I N F O R M A T I O N S Y S T E M S M A N A G E M E N T
W I N T E R 2 0 0 0
DATA WAREHOUSING

aspects of data: (1) syntactics, which defines ware, hardware, and networking technologies.
the structure of the database; (2) semantics, Equally important to the technological exper-
which defines the meaning of the data; and (3) tise of a DWA is the possession of strong busi-
pragmatics, which defines the source and life ness acumen. Data warehouses are dynamic
cycle of the data (effective date, source system, resources used by business managers and deci-
etc.). Business metadata defines relatively static sion makers to analyze complex business situa-

Equally
business objects and concepts such as branch
office code or product code. This data is typical
tions. The DWA must be familiar with the
relevant decision-making processes in order to
important to of commonly used “look-up” tables. Business properly design and maintain the data ware-
metadata, for example, would keep track of house structure. The more specialized the deci-
the
how many systems maintain their own copies of sion support requirements, the more design
technological state codes and how many systems maintain tradeoffs the DWA must consider. The DWA
expertise of a copies of product codes. should possess knowledge of the business orga-
Finally, backup and recovery of a database nization itself as well as general business princi-
DWA is the
that is the size of a data warehouse requires ples for which the data warehouse is being
possession of much more efficient techniques than a DBA developed. It is especially critical for a DWA to
strong would use. Because of the enormous size of a balance the current requirements and capabili-
data warehouse and the resulting time that can ties of the data warehouse while providing for
business
be consumed in the backup process, the DWA the rapid deployment of enhancements.
acumen.
Downloaded by [141.214.17.222] at 19:40 25 October 2014

must carefully plan backup strategy during the Finally, the DWA must possess excellent
design phase of the data warehouse. For exam- communication skills. This is more critical for a
ple, the planned use of read-only tablespaces, DWA than for a DBA, because the DWA will
the use of a strategic partitioning scheme, and be closely interacting with the managers in
the judicious choice of partition size is critical to using the data warehouse effectively. The DBA
the success of efficient backup. In the complex often needs only to communicate with pro-
world of a data warehouse, it is especially crucial grammers and other technical personnel who
to perform testing on backup strategies. As far as act as intermediaries in the specification and
recovery is concerned, the DWA must have a design process. This is not usually the case for
recovery plan for each possible failure scenario. the DWA, who must participate in the business
The DWA must be familiar with the hardware meetings directly in order to meet the fast
and software needed to implement backup and development life cycles of a typical data ware-
recovery strategies. For example, there are a large house project. If the DWA cannot communi-
number of tape backup options on the market, cate in the language of nontechnical people,
including a variety of tape media, standalone the data warehouse project will suffer from mis-
tape drives, tape stackers, and tape silos. Of understanding and delay.
course, given the size of the data warehouse, disk
backup is not as feasible as tape backup, but, if so FUTURE TRENDS
desired, disk technology can be used to accom-
plish disk-to-disk backup and mirror breaking. Data warehousing has given a competitive edge
Backup software packages include Ominiback II to a number of enterprises that have, in recent
(HP), ADSM (IBM), Alexandria (Sequent), years, employed this technology. Whether the
Epoch (Epoch Systems), and Networker data warehouse is an operational data store, a
(Legato). It is possible to achieve different com- data mart, or a full-blown enterprise data ware-
binations of backup hardware and software that house, the administration and management
can yield anywhere from 10 BG per hour to 500 requirements of the enormous amount of data
GB per hour. 8 Again, due to the size of the data in a data warehouse differ significantly from
warehouse, performance of the backup process those of a traditional DBA managing an opera-
becomes an important issue. To minimize the tional database.
backup time, parallelism is used. Exhibit 4 sum- Because the characteristics (which are deter-
marizes some of the differences between the mined by the design goals) of a data warehouse
tasks of a DBA and those of a DWA. are in many ways very different than those of an
operational database, the data warehouse
administrator (DWA) must possess knowledge
The Organizational Role of a Data and skills that his or her DBA counterpart need
Warehouse Administrator not have. Differences in requirements show up
As discussed on the foregoing text, the DWA in all phases of the data warehouse develop-
must be familiar with high-performance soft- ment: planning, designing, implementing, and
© 2 0 0 0 C R C P R E S S L L C I N F O R M A T I O N S Y S T E M S M A N A G E M E N T
W I N T E R 2 0 0 0
DATA WAREHOUSING

EXHIBIT 4 Tasks of DBA vs. DWA

Task Dimension DBA DWA


Planning Future growth in size Moderate Drastic
and usage
Data consistency and Transaction-level Aggregation-level
integration
Storage requirements Less issues More issues (e.g., disk arrays and
parallelism)
Design Goals Meet business operational Meet business analysis informational
data requirements requirements
Methodology Mainly ER diagrams Mainly dimension maps
Implementation Scope Application-oriented Enterprise-oriented
Approach Monolithic Incremental
Granularity Transaction-based updates Bulk updates
Query type Many reporting predefined Many analysis ad hoc queries
queries
Data level Detailed Summarized and aggregated
Downloaded by [141.214.17.222] at 19:40 25 October 2014

Maintenance Security View-based User-based


Security risk Security breaches relatively Security breaches relatively have far-
have contained impact reaching impact
Backup and recovery Backup less time-consuming Backup more time-consuming;
performance more critical

maintaining. Exhibit 4 summarizes some of fashion within the data warehouse environment
these differences. and is then available the next time contact is
The success of a data warehouse depends to made with the customer. It is also used to eval-
a large extent on the DWA, who must be very uate marketing strategies and tactics to
knowledgeable about current data warehouse improve business performance. Newer genera-
hardware and software technology. In addition, tion data warehouses that include more varied
because of the nature and purpose of a data metadata and complex data types require spe-
warehouse, it is equally important for the suc- cialized database features. New indexing struc-
cessful DWA to possess knowledge of the busi- tures and query capabilities are being provided
ness organization itself and general business by DBMSs to support these nontraditional data
principles as well. types. Querying a database for text, color, or
Emerging trends in data warehousing offer sound has become a reality during the past sev-
competitive advantages to organizations poised eral years.
to take advantage of them. These exciting new In addition, Internet/intranet access allows
technologies also offer new challenges to the organizations to expose information within
data warehouse administrator, whose responsi- their data warehouses to users outside their
bility it will be to leverage them for the benefit immediate physical networks. Some organiza-
of the organization. tions have opened up portions of their data
One emerging area facilitated by the use of warehouses to customers. Federal Express, for
data warehouse technology is Business Intelli- example, allows customers to query their parcel
gence. New forms of data are being integrated tracking system over the Internet to obtain the
with data warehouses to provide higher forms status of their shipment. A number of banks
of information to decision makers. Large blocks and investment companies provide secured,
of text, graphics, and even sound are being online access to customers’ accounts. Many
added to traditional data. Many large telemar- more organizations have enabled intranet
keting organizations have deployed systems for access within their network firewalls to essen-
their call centers that provide customer service tially broadcast data warehouse information to
representatives with telemarketing scripts. The employees regardless of where they work and
systems are able to capture typed notes and what kind of PC they may have. Intranet access
customer responses on a real-time basis. This to data warehouses has also become critical to
“nonstructured” data is stored in an integrated field sales and support personnel to improve
© 2 0 0 0 C R C P R E S S L L C I N F O R M A T I O N S Y S T E M S M A N A G E M E N T
W I N T E R 2 0 0 0
DATA WAREHOUSING

their performance. Web-enabling data ware- 3. Gardner, Stephen R., “Building the Data
houses potentially places greater need for Warehouse,” Communications of the ACM, Vol.
online query response and 24x7 availability — 41, No. 9, (Sept, 1998), pp. 52–60.
requirements not necessarily taken into 4. Gerdel, Thomas W., “LTV Steel computer
methods praised,” The Cleveland Plain Dealer
account when the data warehouse was origi-
(July 7, 1996).
nally designed.
5. Radding, Alan, “Support Decision Makers with a
The use of data warehousing will continue to Data Warehouse,” Datamation, (March 15, 1995),
grow as more and more companies realize the pp. 53–56.
benefits afforded to them by this technology. 6. Ricciuti Mike, “Multidimensional Analysis:
The role of the DWA will, to a large extent, Winning the Competitive Game,” Datamation,
determine the success or failure of the data (Feb. 15, 1994), pp. 21–26.
warehouse within the organization. ▲ 7. van den Hoven, John, “Data Marts: Plan Big,
Build Small,” Information Systems Management,
(Winter, 1998), pp. 71–73.
References 8. Weldon, Jay-Louise, Warehouse Cornerstones,
1. Anahory, S. and Murray, D., Date Warehousing in Byte, (Jan., 1997), pp. 85–88.
the Real World, Addison-Wesley, Harlow, England
(1997), pp. 205–221.
2. Callaway, Erin, “License To Drive,” Inside
Technology Training, (July/August, 1998), pp.
Downloaded by [141.214.17.222] at 19:40 25 October 2014

18–21.

© 2 0 0 0 C R C P R E S S L L C I N F O R M A T I O N S Y S T E M S M A N A G E M E N T
W I N T E R 2 0 0 0

You might also like