Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

1.

2 Bullding a Data Warehouse


For survival and success in business following factors are important:
Quick decisions using all available data.
The fact that users are not computer experts need to be cansidered.
Rapid growth of data.
Conpetitdon due o adiption of business itelligence echnigues. Metadata
Dat warehoiuse must hendle the incompatibilitstprtorhatiohal and operatfónal Refer sectioh 1.1.4 for |detailed description of metadata.
syptem
Data distrlbutlon:
At the satme tme is has to handle ever changirig 1T infrastrutture. A s data grows rapidly, it becomes necessary to distribute it,to multiple
servers.
location or tine should
1.21 Bvsiness Considerations In this process of|data distribution by subject area,
cotsidered.
According to requirehent of business, the organization may choose to buld the
Tools
separate data warehouses for different departments.
T o ihplernernt dat warehouse various tools are available.
.
Intividual, warehouse is called as data hart. These used for data zioverhent, end user query, reporting,
tools are
data analys=
For development of warehouse two approaches can be takeh: etc.
.1Top. down: approach : In this enterprise data model is developed first Each tool maihtais our metadata stored in proprietary metadata repository
consideritg vardous bustness requireients. Later warehouse is built using
to ensure that thee selected tools are compatible with d=
data mart The care must be aken
warehouse environment.
2. Bottom-up approach : In this individual data marts are built first, which are
thenintegrated ratio enterprise data warehouse. Performance consideraflons:
The data warehouse should support rapid query processing
Organlzational lssues :
Nihe decislons in the design of a data warehouse
deta
Ganerally organizations can built operational systerns effictently but to bult a The management expects precise and quick response or processing of enterp
warehouse there are disferent requiremehts.
data.
The data from operational systerms as well as ftrom outside need to be considered. It's responsibility bf desigher is to provide answers to all the questions to al s
Deta warehouse bullding is not just a techrtical issue but case should be taken to questions by management but still have a simple design.
estabish infornation requlrements.
To factlitate this te deslgn methodology by Ralph kimball and can be used, wh
is know as "hinestep rmethod".
1.2.2 Design Conelderatlons
.The hine steps are listed in the Table 1.2.1 below.
F e r designing data warehouse, designer Must consider all data warehouse
components, all possible 'data sources ahd all known usage requirerhents. Chooping he subject thatter
database.
.The data is consolidated from multple heterogeneous sources into query Deciding hat afact table represents
Heterogenety data and tehdeney of growth of
of data resources, use of Nistorical
database are the main consideratdon factors.
ldentilyin and conforming the dimensions
Data content
4 Choosinghe facts
D a t a warehouse need detailed data but the data need to be cleaned and 5 Storing ptecalculations in the fact table
transforthed to At in the warehouse model. Roundingout the dinension tables
Content and struchure of data warehouse cah be seen in its data model. Choosing the duration of the database
.Data tnodel in a tenmplate that describes how ihforhation will be organized in data The needo track slowlychanging dimensions
warehwuse framework. the mod
Dedding the query priorites and query
'
1 10 waonousing, Busin6S3 Analyss
and On-4ine Analytlcal Processing (OLAP) Data Warehouslng.
.In both top-down view or in Dala Warehousing and Data Mining 1-11 end On-LIre
Buan Anslysls
bottom up view, Analylióal Piocssaing (OLAP)
should follow following
steps. the |data ware house
1. designer oThe balance need to be maintained between
computing components like number
Choosing the subject matter of
The designed data mart should
particular data mart
a
of processors and 1/0 bandwidth.
answer For this disk 1/0 rates and processor capability need
should be accessible important busihess questions and also it
for data extraction.
to be analyzed.
Non unisorm distribution of data or dat skew willhave effect om calability,
As per the kimball the
process can be started by which may overpower best data layoutfor parallel execution.
monthly statements. building data mart consisting
of customer invoices and a
Data warehouse and DBMS speclallzaton
2.
Deciding exactly what a fact table represents
o
A fact table is the central table in design that has
Catering to large size of databases performance, throughput and scalability are the
important requirements for data warehouse DBMS.
oMultipart key components acts as foreigu multipart key.
key to an individual dimension table. The relational DBMS systems ike DB2, Oracle, Infomix or Sybase are,used to
After deciding fact table
decided. representation, dimensions of data marts fact table fulfil the requirement of data warehouse.
Some move specialized databases include Red brick warehouse from Red brick
3.
Identifying and confor1ning the dimensions systems.
Dimensions ar? very important
part of data mart. Communlcatons Infrastructure
They make data mart understandable and easy to
.To access the corporate data from the desktop require cost ard efforts,
While use.
deciding dimensions, long range data
Typically large bandwidth is required to interact with data warehouse.
If a dimension
warehouse ishould be considered.
occur in two data marts it
subset of each other.
should ba same or mathematical 2 . Implementation Considerations
Such type of
dimension is called as conformed dimension. Implementation of data warehouse needs integration of many products within a
warehouse.
12.3 Technical Considerations .To build a data warehouse following logical steps need to be taken
Various technical Business requirement collection and analysis.
issues need to be considered while
Some of them are
building a warehouse. o Data model and physical design for warehouse.
The hardware platform Deine data sources.
Supporting DBMS Database technology and platíorm selection for warehouse
Communication infrastructure Data extraction, transformation, cleaning andloading into database.
i
Access and reporting tool selection for the database.
Hardware and software support for metadata
The
repository Selection of database correctively software.
system management framework for earlier environment.
oSelection and data analysis and presentationsoftware.
dardware platforms oData warehouse updation.
he
following hardware platform considerations should be taken care of while
designing data warehouse. Data extracton, clean up, transformatlon and migratlon.
Data extraction in critical factor for making successful data warehouse
Capacity of handling
range volume of data for decision support applications.
a
architecture.
warehouse server
Data should be specialized to handle tasks related to date
warehouse mainframe can be used as data warehouse server. Following selection criteria related to transformation, consolidation, iniegration,
repairing of data should be considered
and On-une Analyical Proca (OLAP
Data Werohovsng end Data Mhing 1 13 and On-LIne Analytical Processing
(O
Identtncattcn of data ih the data souree eftvironhent
lat fles; indexed fles atd legácy is important. capabilites and ACCESS WORKS as.database servers to provide users
DBMS should be ability to build andl use information warehouses.
is still
stored ih these formats.. supported as most of the data
2. Mewlett-Packardi
MMetgtns data from differentdata stores is
important. They gvesingle dpurce support for full HP open warehouse solution.
Dabatype and character set translationi is needed. P P open warehouse consists of data management architecture,
Sathtnardzaton, aggregation, etc. the FP-UX operating systern, HP 9000 computers, warehouse management
Bvaluadon of vendor stability is
capabilitdes are heeded. a Alibase/SQL lational database and HP information access query tool
needed.
3. IBM:
Vendor solutona.
The vendors described' below provide rlore focused 1BM ihformation warehouse includes
solutions to fulfil the Data management tools
eoulretnents for data warehouse implemethtatdon.
1. Prlon solutiohs : OS/2, AIX andMVs Os
Piierhi warshouse hanager extracts data frotm hulttple Hardware platfors
source environthertts like
DB2, IDMS, ÎMS, VSAM, etc. Relatiohal datajpase
Target databases are Oracle, Sybase and Infortmib. Other components are
Data Guide/2 {catalog of shared data and informatioz1 objects
2. Carleton's PASSPORT,:
I t consisis of rwo corhponehts Data propagation
1. First component collects the fle record tabl
layouts and converts into Passport
oData refresher
Data Langiage (FDL) Data hub
2. 9econd component is used to create the hetadata ditectory which is used to Application syatem and personal application systen
buld COBOL programs to create the extracis. Query mahagement facility IBM flow mark
3. Infomation builders Inc.: Sequent
These products provide SQL access ard uhiforth relatiotnal view of relational ahd Sequent computer systems Inc. has a decisiorn point program for delivering of
relattonal data in 60 different databases and 35 differetnt platforths.
non
warehouses.
4. SAS institute Inc. :
I t has sequent ymmetric multprocessing (SMP) architecthural with clien/
SAS systen tools are used for all data warehousing functtonrs. products añd services such as UNDX-based sequent symmetry 2000 seri
Motadata Brick warehouse for systems and clear access query tool frem
Red Brick
Refer section 1.14 for the details. access corpP.
1.2.5 Intwgrated Solutlo 1.2.6 Behefits of Data Warehousing
Vendors provide suite of services and products for establishtment of data There are two major benefits of data warehouse architecture
warehouse: 1. The availability of business intelligence data is increased.
Some of the vendors are as follows: 2. Business decisions can be made more effectively considering the timeline
1. Digital equlpment co constraints.
They ise:
Prisih warehouse manager for data tmodeling extractiort and clearsing
n e s S AnaysIS
The ang On-Lne Analytical Deta Warehoung, Buaness Analyi
benefits can be Processind
g (OLAP) Deta Warehousingand Data Minng
1. Tangible benefits categorized as "1-15 and On-une Analydoel Proossang (0LAP
2. Intangible benefits
1.3.1 Shared Evorything Archltecture (Shared Memory:Archltecture)
1. Tangible benefits The parallel platform in which all the processors access the common data space is
called as shared miemory platform.
.
One of the major benefit is out of stock .Processors interact with each other by accessing and modifying the data elements
Some additional
benefits are
conditions can be improved stored in the shared address space.
It
provides big picture of purchasing and I t is a
traditional approachi to implement RDBMS on SMP hard war
cost saving.
inventory patterns which facilitates As shown in Fig. 1.3.1 single system enrage is provided to the usecl
oBusiness intelligence can be enhanced by proper
Cost effective decisions
can be made by
market analysis. Inlerconnecton Nelwork*
and operational databases.
spectrum of ad-hoc query
processing
Target market selection is
introduction. Also improved resulting in
improvement product inventory decrease isin cost of product Processor Procossor Processor Procassor
2. Intangible benefits: turnover observed. Unit Unit Unit Unit
(PU) (PU) (PU)
Following are the intangible benefits of
data
As all the required data can be kept atwarehouse
mproved. sungle a
location productivity s
oOverlap in decision support applications is Global Shared Memary
processing reduced by reducion in redundant
Customer relations are enhanced
individual as
better understood. requirements and trends can be
Useful insights are
provided into work
processing
can be
reengineered by inclusion of innovative hrough which the processes
ideas.
O.3 Database Architectures for Parallel
Processing Flg. 1.3.1 Shared memory architecture
Parallel architectures include parallel hardware on
be
exploited along with parallel operating system. which software parallelism can All the processors, nemory and entire database is utilized by eirigle RDBMS sever.
The use of suitable The SQL statements gxecuted by multiple database companents ate commuricated
parallel database software architecture
is required to take to each other by exchanging the messages.
shared memory and distributed
dvantage of
memory parallel environments. The data is partitioned in local disks which can be accessed by allthe processor.
Use of parallel software database architecture decides
h r e e maun
scalability of tdhe solution. The scalability is dependent on the design process
DBMS software architecture are
Shared-everything architecture
1. Process based implementation
Ecploited in oracle 7x running on UNDX plattorm
Shared disk architecture
2 Thread based implementation:
Shared nothing architecture/
RDBMS implement its own threads
eg. SYBASE SQL server
ORit used OS treads.. as if are readie
the servers
eg Micbsoft SOL server fiuhtihg on NT. .This scehario poses (he challenge of synchronization
resources are wasted in synchronization.
The hreads based architecture providés bétter scalability due to better utilizatiorn. and updating the.sarte data,
RDBMS servers are more, DLM
and fast context switching. One mote drawbackis : 1f utilization of
I f the threads are too tightly wittess the bottlenedk
eoupled it results in limited RDBMS portability. Soihe of the advantagesof this architecture are
The dsadvantages of this architecture are distribution ofce
Systerh availabilitis increased as bottleneck due
to uneven
Scalability is lHmited isreduced.
Tluroughput is limited as itis based on processor athd systerh bus speed. DBMS dependendy on data partitionirng is reduced due to reduction in memo
access bottleneck|
1.3.2 Sharod Disk Archltecture
.Theexample of thisarchitecture are:
As: shown in Fig. 13.2 this architecture uses concept of distributed memory Oracle parallel sever and DB2/MVS running in BM's parallel sysplex.
sy'stent
1.3.3 Shared-nothing|Architecture
Interconnecton Netwoik As shown in Fig. 133 in shared nothing architecture each nodes the disk and
is partitioned into hese disks.
.DBMS 1s also partfoned into co-owners which resend on these disks.
Proceseor Procossor Procoseo Processor sQL query in exequted occurs the nodes.parallely.
Unit Unlt UAIt Unt.
(FU) (PU) (PU) (PU) This architecture isuitable for MPP and cluster systems,
I t is the difficultrchitecture for implementation due to need of, new com
and specified progratrming languages.
Locel Locel Local Ocal
: Memory Memory Memory Mermory Interconnection Network
Global Shared Disk,Subsystem Processo Processor Processor
ProcessoO Unlt Unit
Unlt UAIt
(PU) (PU) (PU) (PU)
Locel Local Local Local
Memory Memory Memory Memory
Flg. 1.3.2 Distibuted memory shared dsk archltecturd
RDBMS servers shares the entire database ruhting ot the hodes.
The records are read, witten, updated and deleted by the each RDBMS
server.
Distibuted lock manager (DLM) concept is used for coordinatioh.
Stngle sysien image is provided by Nding the DLM cothponents foundin
Nardware, OS in software layes etc.C Flg.1.3.3 Distrlbuted memory architecture
.For implementation of parallel DBMS architecture following are the requirements
ofshared nothing architecture
Function shipping support
After parallization of SQL query the decomposed statements should be
directed for execution to the processor possessing the 'data for execution of
that query.
Parallel join strategies
If the rows residing on same partition are joined it is called as colocated
join.
.If the rows reside on different partitions, the
techniques ike redirected jotns
need to be adapted in which rows of one table
residing on partition are
moved to other partition and in turn both table rows are sent to third node
for joining
This type of data movement from one node to another need following
requirements
Support for data
repartitioning.
Query compilation
Support for database transactions
Support for the single image of database environment.
34 Combined Architecture
It supports inter server parallelism.
Each query in parallelized across multiple servers.
It takes complete advantage of its operating environutlernt.

You might also like