1. Building a data warehouse requires considering quick decision making using all available data, non-technical users, rapid data growth, and competitive pressures. The warehouse must handle changing IT infrastructure and data distribution across servers while accounting for organizational issues.
2. When designing a data warehouse, the designer must consider all components and data sources as well as usage requirements. The data is consolidated from multiple systems into a single database while addressing heterogeneity, historical data use, and data growth. The data content and structure are defined in a data model.
3. Key steps in the design process include choosing the subject matter, identifying dimensions and facts, defining the data structure, and determining query priorities and the query model. Performance also requires balancing
1. Building a data warehouse requires considering quick decision making using all available data, non-technical users, rapid data growth, and competitive pressures. The warehouse must handle changing IT infrastructure and data distribution across servers while accounting for organizational issues.
2. When designing a data warehouse, the designer must consider all components and data sources as well as usage requirements. The data is consolidated from multiple systems into a single database while addressing heterogeneity, historical data use, and data growth. The data content and structure are defined in a data model.
3. Key steps in the design process include choosing the subject matter, identifying dimensions and facts, defining the data structure, and determining query priorities and the query model. Performance also requires balancing
1. Building a data warehouse requires considering quick decision making using all available data, non-technical users, rapid data growth, and competitive pressures. The warehouse must handle changing IT infrastructure and data distribution across servers while accounting for organizational issues.
2. When designing a data warehouse, the designer must consider all components and data sources as well as usage requirements. The data is consolidated from multiple systems into a single database while addressing heterogeneity, historical data use, and data growth. The data content and structure are defined in a data model.
3. Key steps in the design process include choosing the subject matter, identifying dimensions and facts, defining the data structure, and determining query priorities and the query model. Performance also requires balancing
For survival and success in business following factors are important: Quick decisions using all available data. The fact that users are not computer experts need to be cansidered. Rapid growth of data. Conpetitdon due o adiption of business itelligence echnigues. Metadata Dat warehoiuse must hendle the incompatibilitstprtorhatiohal and operatfónal Refer sectioh 1.1.4 for |detailed description of metadata. syptem Data distrlbutlon: At the satme tme is has to handle ever changirig 1T infrastrutture. A s data grows rapidly, it becomes necessary to distribute it,to multiple servers. location or tine should 1.21 Bvsiness Considerations In this process of|data distribution by subject area, cotsidered. According to requirehent of business, the organization may choose to buld the Tools separate data warehouses for different departments. T o ihplernernt dat warehouse various tools are available. . Intividual, warehouse is called as data hart. These used for data zioverhent, end user query, reporting, tools are data analys= For development of warehouse two approaches can be takeh: etc. .1Top. down: approach : In this enterprise data model is developed first Each tool maihtais our metadata stored in proprietary metadata repository consideritg vardous bustness requireients. Later warehouse is built using to ensure that thee selected tools are compatible with d= data mart The care must be aken warehouse environment. 2. Bottom-up approach : In this individual data marts are built first, which are thenintegrated ratio enterprise data warehouse. Performance consideraflons: The data warehouse should support rapid query processing Organlzational lssues : Nihe decislons in the design of a data warehouse deta Ganerally organizations can built operational systerns effictently but to bult a The management expects precise and quick response or processing of enterp warehouse there are disferent requiremehts. data. The data from operational systerms as well as ftrom outside need to be considered. It's responsibility bf desigher is to provide answers to all the questions to al s Deta warehouse bullding is not just a techrtical issue but case should be taken to questions by management but still have a simple design. estabish infornation requlrements. To factlitate this te deslgn methodology by Ralph kimball and can be used, wh is know as "hinestep rmethod". 1.2.2 Design Conelderatlons .The hine steps are listed in the Table 1.2.1 below. F e r designing data warehouse, designer Must consider all data warehouse components, all possible 'data sources ahd all known usage requirerhents. Chooping he subject thatter database. .The data is consolidated from multple heterogeneous sources into query Deciding hat afact table represents Heterogenety data and tehdeney of growth of of data resources, use of Nistorical database are the main consideratdon factors. ldentilyin and conforming the dimensions Data content 4 Choosinghe facts D a t a warehouse need detailed data but the data need to be cleaned and 5 Storing ptecalculations in the fact table transforthed to At in the warehouse model. Roundingout the dinension tables Content and struchure of data warehouse cah be seen in its data model. Choosing the duration of the database .Data tnodel in a tenmplate that describes how ihforhation will be organized in data The needo track slowlychanging dimensions warehwuse framework. the mod Dedding the query priorites and query ' 1 10 waonousing, Busin6S3 Analyss and On-4ine Analytlcal Processing (OLAP) Data Warehouslng. .In both top-down view or in Dala Warehousing and Data Mining 1-11 end On-LIre Buan Anslysls bottom up view, Analylióal Piocssaing (OLAP) should follow following steps. the |data ware house 1. designer oThe balance need to be maintained between computing components like number Choosing the subject matter of The designed data mart should particular data mart a of processors and 1/0 bandwidth. answer For this disk 1/0 rates and processor capability need should be accessible important busihess questions and also it for data extraction. to be analyzed. Non unisorm distribution of data or dat skew willhave effect om calability, As per the kimball the process can be started by which may overpower best data layoutfor parallel execution. monthly statements. building data mart consisting of customer invoices and a Data warehouse and DBMS speclallzaton 2. Deciding exactly what a fact table represents o A fact table is the central table in design that has Catering to large size of databases performance, throughput and scalability are the important requirements for data warehouse DBMS. oMultipart key components acts as foreigu multipart key. key to an individual dimension table. The relational DBMS systems ike DB2, Oracle, Infomix or Sybase are,used to After deciding fact table decided. representation, dimensions of data marts fact table fulfil the requirement of data warehouse. Some move specialized databases include Red brick warehouse from Red brick 3. Identifying and confor1ning the dimensions systems. Dimensions ar? very important part of data mart. Communlcatons Infrastructure They make data mart understandable and easy to .To access the corporate data from the desktop require cost ard efforts, While use. deciding dimensions, long range data Typically large bandwidth is required to interact with data warehouse. If a dimension warehouse ishould be considered. occur in two data marts it subset of each other. should ba same or mathematical 2 . Implementation Considerations Such type of dimension is called as conformed dimension. Implementation of data warehouse needs integration of many products within a warehouse. 12.3 Technical Considerations .To build a data warehouse following logical steps need to be taken Various technical Business requirement collection and analysis. issues need to be considered while Some of them are building a warehouse. o Data model and physical design for warehouse. The hardware platform Deine data sources. Supporting DBMS Database technology and platíorm selection for warehouse Communication infrastructure Data extraction, transformation, cleaning andloading into database. i Access and reporting tool selection for the database. Hardware and software support for metadata The repository Selection of database correctively software. system management framework for earlier environment. oSelection and data analysis and presentationsoftware. dardware platforms oData warehouse updation. he following hardware platform considerations should be taken care of while designing data warehouse. Data extracton, clean up, transformatlon and migratlon. Data extraction in critical factor for making successful data warehouse Capacity of handling range volume of data for decision support applications. a architecture. warehouse server Data should be specialized to handle tasks related to date warehouse mainframe can be used as data warehouse server. Following selection criteria related to transformation, consolidation, iniegration, repairing of data should be considered and On-une Analyical Proca (OLAP Data Werohovsng end Data Mhing 1 13 and On-LIne Analytical Processing (O Identtncattcn of data ih the data souree eftvironhent lat fles; indexed fles atd legácy is important. capabilites and ACCESS WORKS as.database servers to provide users DBMS should be ability to build andl use information warehouses. is still stored ih these formats.. supported as most of the data 2. Mewlett-Packardi MMetgtns data from differentdata stores is important. They gvesingle dpurce support for full HP open warehouse solution. Dabatype and character set translationi is needed. P P open warehouse consists of data management architecture, Sathtnardzaton, aggregation, etc. the FP-UX operating systern, HP 9000 computers, warehouse management Bvaluadon of vendor stability is capabilitdes are heeded. a Alibase/SQL lational database and HP information access query tool needed. 3. IBM: Vendor solutona. The vendors described' below provide rlore focused 1BM ihformation warehouse includes solutions to fulfil the Data management tools eoulretnents for data warehouse implemethtatdon. 1. Prlon solutiohs : OS/2, AIX andMVs Os Piierhi warshouse hanager extracts data frotm hulttple Hardware platfors source environthertts like DB2, IDMS, ÎMS, VSAM, etc. Relatiohal datajpase Target databases are Oracle, Sybase and Infortmib. Other components are Data Guide/2 {catalog of shared data and informatioz1 objects 2. Carleton's PASSPORT,: I t consisis of rwo corhponehts Data propagation 1. First component collects the fle record tabl layouts and converts into Passport oData refresher Data Langiage (FDL) Data hub 2. 9econd component is used to create the hetadata ditectory which is used to Application syatem and personal application systen buld COBOL programs to create the extracis. Query mahagement facility IBM flow mark 3. Infomation builders Inc.: Sequent These products provide SQL access ard uhiforth relatiotnal view of relational ahd Sequent computer systems Inc. has a decisiorn point program for delivering of relattonal data in 60 different databases and 35 differetnt platforths. non warehouses. 4. SAS institute Inc. : I t has sequent ymmetric multprocessing (SMP) architecthural with clien/ SAS systen tools are used for all data warehousing functtonrs. products añd services such as UNDX-based sequent symmetry 2000 seri Motadata Brick warehouse for systems and clear access query tool frem Red Brick Refer section 1.14 for the details. access corpP. 1.2.5 Intwgrated Solutlo 1.2.6 Behefits of Data Warehousing Vendors provide suite of services and products for establishtment of data There are two major benefits of data warehouse architecture warehouse: 1. The availability of business intelligence data is increased. Some of the vendors are as follows: 2. Business decisions can be made more effectively considering the timeline 1. Digital equlpment co constraints. They ise: Prisih warehouse manager for data tmodeling extractiort and clearsing n e s S AnaysIS The ang On-Lne Analytical Deta Warehoung, Buaness Analyi benefits can be Processind g (OLAP) Deta Warehousingand Data Minng 1. Tangible benefits categorized as "1-15 and On-une Analydoel Proossang (0LAP 2. Intangible benefits 1.3.1 Shared Evorything Archltecture (Shared Memory:Archltecture) 1. Tangible benefits The parallel platform in which all the processors access the common data space is called as shared miemory platform. . One of the major benefit is out of stock .Processors interact with each other by accessing and modifying the data elements Some additional benefits are conditions can be improved stored in the shared address space. It provides big picture of purchasing and I t is a traditional approachi to implement RDBMS on SMP hard war cost saving. inventory patterns which facilitates As shown in Fig. 1.3.1 single system enrage is provided to the usecl oBusiness intelligence can be enhanced by proper Cost effective decisions can be made by market analysis. Inlerconnecton Nelwork* and operational databases. spectrum of ad-hoc query processing Target market selection is introduction. Also improved resulting in improvement product inventory decrease isin cost of product Processor Procossor Processor Procassor 2. Intangible benefits: turnover observed. Unit Unit Unit Unit (PU) (PU) (PU) Following are the intangible benefits of data As all the required data can be kept atwarehouse mproved. sungle a location productivity s oOverlap in decision support applications is Global Shared Memary processing reduced by reducion in redundant Customer relations are enhanced individual as better understood. requirements and trends can be Useful insights are provided into work processing can be reengineered by inclusion of innovative hrough which the processes ideas. O.3 Database Architectures for Parallel Processing Flg. 1.3.1 Shared memory architecture Parallel architectures include parallel hardware on be exploited along with parallel operating system. which software parallelism can All the processors, nemory and entire database is utilized by eirigle RDBMS sever. The use of suitable The SQL statements gxecuted by multiple database companents ate commuricated parallel database software architecture is required to take to each other by exchanging the messages. shared memory and distributed dvantage of memory parallel environments. The data is partitioned in local disks which can be accessed by allthe processor. Use of parallel software database architecture decides h r e e maun scalability of tdhe solution. The scalability is dependent on the design process DBMS software architecture are Shared-everything architecture 1. Process based implementation Ecploited in oracle 7x running on UNDX plattorm Shared disk architecture 2 Thread based implementation: Shared nothing architecture/ RDBMS implement its own threads eg. SYBASE SQL server ORit used OS treads.. as if are readie the servers eg Micbsoft SOL server fiuhtihg on NT. .This scehario poses (he challenge of synchronization resources are wasted in synchronization. The hreads based architecture providés bétter scalability due to better utilizatiorn. and updating the.sarte data, RDBMS servers are more, DLM and fast context switching. One mote drawbackis : 1f utilization of I f the threads are too tightly wittess the bottlenedk eoupled it results in limited RDBMS portability. Soihe of the advantagesof this architecture are The dsadvantages of this architecture are distribution ofce Systerh availabilitis increased as bottleneck due to uneven Scalability is lHmited isreduced. Tluroughput is limited as itis based on processor athd systerh bus speed. DBMS dependendy on data partitionirng is reduced due to reduction in memo access bottleneck| 1.3.2 Sharod Disk Archltecture .Theexample of thisarchitecture are: As: shown in Fig. 13.2 this architecture uses concept of distributed memory Oracle parallel sever and DB2/MVS running in BM's parallel sysplex. sy'stent 1.3.3 Shared-nothing|Architecture Interconnecton Netwoik As shown in Fig. 133 in shared nothing architecture each nodes the disk and is partitioned into hese disks. .DBMS 1s also partfoned into co-owners which resend on these disks. Proceseor Procossor Procoseo Processor sQL query in exequted occurs the nodes.parallely. Unit Unlt UAIt Unt. (FU) (PU) (PU) (PU) This architecture isuitable for MPP and cluster systems, I t is the difficultrchitecture for implementation due to need of, new com and specified progratrming languages. Locel Locel Local Ocal : Memory Memory Memory Mermory Interconnection Network Global Shared Disk,Subsystem Processo Processor Processor ProcessoO Unlt Unit Unlt UAIt (PU) (PU) (PU) (PU) Locel Local Local Local Memory Memory Memory Memory Flg. 1.3.2 Distibuted memory shared dsk archltecturd RDBMS servers shares the entire database ruhting ot the hodes. The records are read, witten, updated and deleted by the each RDBMS server. Distibuted lock manager (DLM) concept is used for coordinatioh. Stngle sysien image is provided by Nding the DLM cothponents foundin Nardware, OS in software layes etc.C Flg.1.3.3 Distrlbuted memory architecture .For implementation of parallel DBMS architecture following are the requirements ofshared nothing architecture Function shipping support After parallization of SQL query the decomposed statements should be directed for execution to the processor possessing the 'data for execution of that query. Parallel join strategies If the rows residing on same partition are joined it is called as colocated join. .If the rows reside on different partitions, the techniques ike redirected jotns need to be adapted in which rows of one table residing on partition are moved to other partition and in turn both table rows are sent to third node for joining This type of data movement from one node to another need following requirements Support for data repartitioning. Query compilation Support for database transactions Support for the single image of database environment. 34 Combined Architecture It supports inter server parallelism. Each query in parallelized across multiple servers. It takes complete advantage of its operating environutlernt.