Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

Data Warehouse Development & Schemas

Data Warehouse Development

Data warehouse development approaches


Inmon Model: EDW approach (top-down) Kimball Model: Data mart approach (bottom-up) Which model is best?

There is no one-size-fits-all strategy to DW

One alternative is the hosted warehouse Data warehouse structure:

The Star Schema vs. Relational

8-2

Real-time data warehousing?


Prof. Pawan Kumar MBA IV SEM (SEC-A)

Inmon Model: The EDW Approach


Top-down Development Spiral Development Approach ERD Based He insisted that data should be organized into subject oriented, integrated, non volatile and time variant structures. Detailed data is regularly extracted from the ODS and Data marts and temporarily hosted in the staging area for aggregation, summarization and then extracted and loaded into the Data warehouse.

8-3

Prof. Pawan Kumar

MBA IV SEM (SEC-A)

Kimball Model: The Data Mart Approach


Bottom up approach uses bus structure. Plan big, build small Subject oriented or department-oriented data warehouse such as marketing or sales. This model strikes a good balance between centralized and localized flexibility. This architecture makes the data warehouse more of a virtual reality than a physical reality. All data marts could be located in one server or could be located on different servers across the enterprise while the data warehouse would be a virtual entity being nothing more than a sum total of all the data marts.

8-4

Prof. Pawan Kumar

MBA IV SEM (SEC-A)

DW Development Approaches
(Inmon Approach) (Kimball Approach)

8-5

Prof. Pawan Kumar

MBA IV SEM (SEC-A)

Data Warehouse Schema Architecture

- Star schema - Snowflake schema - Fact constellation schema

8-6

Prof. Pawan Kumar

MBA IV SEM (SEC-A)

Star schema

A star schema can be simple or complex. A simple star consists of one fact table; a complex star can have more than one fact table.

It contains two types of tables


foreign keys to dimension tables and measures those that contain numeric facts. A fact table can contain fact's data on detail or aggregated level.

Fact Tables: A fact table typically has two types of columns:

Dimension Tables: A dimension is a structure usually composed


of one or more hierarchies that categorizes data.

They are normally descriptive, textual values Dimension tables are generally small in size then fact table.

8-7

Prof. Pawan Kumar

MBA IV SEM (SEC-A)

Star schema

8-8

Prof. Pawan Kumar

MBA IV SEM (SEC-A)

The main characteristics of star schema:


Simple structure -> easy to understand schema Great query effectives -> small number of tables to join Relatively long time of loading data into dimension tables -> de-normalization, redundancy data caused that size of the table could be large. The most commonly used in the data warehouse implementations -> widely supported by a large number of business intelligence tools

8-9

Prof. Pawan Kumar

MBA IV SEM (SEC-A)

Snowflake schema

The snowflake schema architecture is a more complex variation of the star schema used in a data warehouse, because the tables which describe the dimensions are normalized.

8-10

Prof. Pawan Kumar

MBA IV SEM (SEC-A)

Fact constellation schema

For each star schema it is possible to construct fact constellation schema(for example by splitting the original star schema into more star schemes each of them describes facts on another level of dimension hierarchies) The fact constellation architecture contains multiple fact tables that share many dimension tables. The main shortcoming of the fact constellation schema is a more complicated design because many variants for particular kinds of aggregation must be considered and selected. Moreover, dimension tables are still large.

8-11

Prof. Pawan Kumar

MBA IV SEM (SEC-A)

Fact constellation schema

8-12

Prof. Pawan Kumar

MBA IV SEM (SEC-A)

Best Practices for Implementing DW



8-13

The project must fit with corporate strategy There must be complete buy-in to the project It is important to manage user expectations The data warehouse must be built incrementally Adaptability must be built in from the start The project must be managed by both IT and business professionals (a businesssupplier relationship must be developed) Only load data that have been cleansed/high quality Do not overlook training requirements Be politically aware.
MBA IV SEM (SEC-A)

Prof. Pawan Kumar

Risks in Implementing DW

No mission or objective Quality of source data unknown Skills not in place Inadequate budget Lack of supporting software Source data not understood Weak sponsor Users not computer literate Political problems or turf wars Unrealistic user expectations (Continued )
MBA IV SEM (SEC-A)

8-14

Prof. Pawan Kumar

Risks in Implementing DW Cont.


Architectural and design risks Scope creep and changing requirements Vendors out of control Multiple platforms Key people leaving the project Loss of the sponsor Too much new technology Having to fix an operational system Geographically distributed environment Team geography and language culture
MBA IV SEM (SEC-A)

8-15

Prof. Pawan Kumar

Things to Avoid for Successful Implementation of DW


Starting with the wrong sponsorship chain Setting expectations that you cannot meet Engaging in politically naive behavior Loading the warehouse with information just because it is available Believing that data warehousing database design is the same as transactional DB design Choosing a data warehouse manager who is technology oriented rather than user oriented (see more on page 356)
MBA IV SEM (SEC-A)

8-16

Prof. Pawan Kumar

RealReal-time DW (a.k.a. Active Data Warehousing)

Enabling real-time data updates for real-time analysis and real-time decision making is growing rapidly

Push vs. Pull (of data)


Not all data should be updated continuously Mismatch of reports generated minutes apart May be cost prohibitive May also be infeasible
MBA IV SEM (SEC-A)

Concerns about real-time BI


8-17

Prof. Pawan Kumar

Evolution of DSS & DW

8-18

Prof. Pawan Kumar

MBA IV SEM (SEC-A)

Active Data Warehousing (by Teradata Corporation)

8-19

Prof. Pawan Kumar

MBA IV SEM (SEC-A)

Comparing Traditional and Active DW

8-20

Prof. Pawan Kumar

MBA IV SEM (SEC-A)

Data Warehouse Administration

Due to its huge size and its intrinsic nature, a DW requires especially strong monitoring in order to sustain its efficiency, productivity and security. The successful administration and management of a data warehouse entails skills and proficiency that go past what is required of a traditional database administrator.

Requires expertise in high-performance software, hardware, and networking technologies


MBA IV SEM (SEC-A)

8-21

Prof. Pawan Kumar

DW Scalability and Security

Scalability

The main issues pertaining to scalability:


The amount of data in the warehouse How quickly the warehouse is expected to grow The number of concurrent users The complexity of user queries

Good scalability means that queries and other data-access functions will grow linearly with the size of the warehouse Emphasis on security and privacy
MBA IV SEM (SEC-A)

Security

8-22

Prof. Pawan Kumar

End of the Chapter

Questions ?

8-23

Prof. Pawan Kumar

MBA IV SEM (SEC-A)

You might also like