Data Warehouse

Data Warehouse:
1) Definition
Data warehousing is subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of managements decision-making process. Subject-orientedWH is organized around the major subjects of the enterprise rather than the major application areas.This is reflected in the need to store decision-support data rather than application-oriented data Integratedbecause the source data come together from different enterprise-wide applications systems , the source data is often inconsistent using.The integrated data source must be made consistent to present a unified view of the data to the users Time-variantthe source data in the WH is only accurate and valid at some point in time or over some time interval. The time-variance of the data warehouse is also shown in the extended time that the data is held, the implicit or explicit association of time with all data, and the fact that the data represents a series of snapshots Non-volatiledata is not update in real time but is refresh from OS on a regular basis. New data is always added as a supplement to DB, rather than replacement. The DB continually absorbs this new data, incrementally integrating it with previous data
Decision support system: Definition:
Decision Support Systems (DSS) are a specific class of computerized information systems that supports business and organizational decision-making activities. A properly-designed DSS is an interactive software-based system intended to help decision makers compile useful information from raw data, documents, personal knowledge, and/or business models to identify and solve problems and make decisions.
Executive information system:
Page1
An Executive Information System (EIS) is a type of management information system intended to facilitate and support the information and decision-making needs of senior executives by providing easy access to both internal and external information relevant to meeting the strategic goals of the organization. It is commonly considered as a specialized form of a Decision Support System (DSS)
2) Characteristics of Data Warehouse: Data warehouse have the following characteristics: Multi dimensional conceptual view Generic dimensionality Unlimited dimensions and aggregation levels Multiuser support Transparency Flexible reporting Client Server architecture 3) Architecture of data warehouse:
The architecture
Operational data source1
Query Manage High Meta-data Lightly summarized data Detailed data summarized data
Operational data source 2 Operational data source n Operational data store (ods) Operational data store (ODS) Load Manager
DBMS
OLAP(online analytical processing) tools
Warehouse Manager Archive/backup data
End-user access tools
Typical architecture of a data warehouse
Operational data sourcesfor the DW is supplied from mainframe operational data held in first generation hierarchical and network databases, departmental data held in proprietary file systems, private data held on workstaions and private serves and external systems such as the Internet, commercially available DB, or DB assoicated with and organizations suppliers or customers.
Page1
Operational datastore(ODS) is a repository of current and integrated operational data used for analysis. It is often structured and supplied with data in the same way as the data warehouse, but may in fact simply act as a staging area for data to be moved into the warehouse. load manageralso called the frontend component, it performance all the operations associated with the extraction and loading of data into the warehouse. These operations include simple transformations of the data to prepare the data for entry into the warehouse warehouse managerperforms all the operations associated with the management of the data in the warehouse. The operations performed by this component include analysis of data to ensure consistency, transformation and merging of source data, creation of indexes and views, generation of denormalizations and aggregations, and archiving and backing-up data query manageralso called backend component, it performs all the operations associated with the management of user queries. The operations performed by this component include directing queries to the appropriate tables and scheduling the execution of queries. detailed, lightly and lightly summarized data,archive/backup data meta-data end-user access toolscan be categorized into five main groups: data reporting and query tools, application development tools, executive information system (EIS) tools, online analytical processing (OLAP) tools, and data mining tools
4) Data flows :
Inflow- The processes associated with the extraction, cleansing, and loading of the data from the source systems into the data warehouse. upflow- The process associated with adding value to the data in the warehouse through summarizing, packaging , packaging, and distribution of the data downflow- The processes associated with archiving and backing-up of data in the warehouse outflow- The process associated with making the data availabe to the endusers Meta-flow- The processes associated with the management of the metadata
Page1
The critical steps in the construction of a data warehouse:
a. Extraction b. Cleansing c. Transformation After the critical steps, loading the results into target system can be carried out either by separate products, or by a single, categories.
5) The benefits of data warehousing:

The potential benefits of data warehousing are high returns on investment.. substantial competitive advantage.. increased productivity of corporate decision-makers..
6) Functionality of data warehouse: Data warehouse must provide greater and more efficient query support . Data ware house access component supports enhanced spreadsheet functionlity, efficient query processing, structed queries, data mining, materialized views. The functionalities are: Roll up : Data is summarized with increasing generalization. (weekly to quartely to annually) Drill- down : Increasing levels of details are revealed. ( the inverse of roll up)
Page1
Pivot : Cross tabulation is performed. Slice and Dice : Performing projection operations on the dimensions. Sorting : Data is sorted by ordinal value. Selection : Data is available by value or range. Derived attributes : attributes are computed by opertations on stored and derived values. 7) Dara warehouse and views: Views and data warehouses are alike in that they both have read- only extracts from data base and subject orientation. Data warehouses are different from views in the following ways: Data warehouses are not usually relational, but rather multi dimentional. Views of a relational data base are relational. Data warehouse can be indexed to optimize performance. Views can not be indexed independent from of the underlying database. Data warehouse characteristically provide specific support of funmctionality; views can not. Data warehouse provide large amount of integrated and often temporal data, whereas views are an extract of a database.
8) Data mart :
data mart a subset of a data warehouse that supports the requirements of particular department or business function a data mart focuses on only the requirements of users associated with one department or business function data marts do not normally contain detailed operational data, unlike data warehouses as data marts contain less data compared with data warehouses, data marts are more easily understood and navigated.
Page1
Reasons for creating a data mart: To give users access to the data they need to analyze most often To provide data in a form that matches the collective view of the data by a group of users in a department or business function To improve end-user response time due to the reduction in the volume of data to be accessed To provide appropriately structured data as ditated by the requirements of end-user access tools Normally use less data so tasks such as data cleansing, loading, transformation, and integration are far easier, and hence implementing and setting up a data mart is simpler than establishing a corporate data warehouse
Page1
data marts issues: data mart functionalitythe capabilities of data marts have increased with the growth in their popularity data mart sizethe performance deteriorates as data marts grow in size, so need to reduce the size of data marts to gain improvements in performance
data mart load performancetwo critical components: end-user response time and data loading performance to increment DB updating so that only cells affected by the change are updated and not the entire MDDB structure
9) Data warehousing life cycle:

The following are the typical processes involved in the data warehousing life cycle. Requirement Gathering Physical Environment Setup Data Modeling ETL OLAP Cube Design Front End Development Performance Tuning Quality Assurance Rolling out to Production Production Maintenance Incremental Enhancements
The first thing that the project team should engage in is gathering requirements from end users. Requirement gathering can happen as one-to-one meetings or as Joint Application Development (JAD) sessions, where multiple people are talking about the project scope in the same meeting. The primary goal of this phase is to identify what constitutes as a success for this particular phase of the data warehouse project
DatawarehouseSchemaDesign:
Database organization must look like business must be recognizable by business user approachable by business user Must be simple
Page1
Schema Types o o o Star Schema Fact Constellation Schema Snowflake schema
1) Star Schema: A single fact table and for each dimension one dimension table Does not capture hierarchies directly
DimensionTables: Define business in terms already familiar to users Wide rows with lots of descriptive text Small tables (about a million rows) Joined to fact table by a foreign key
typical dimensions
Page1
heavily indexed
time periods, geographic region (markets, cities), products, customers, salesperson, etc
Fact Table: Central table Typical example: individual sales records mostly raw numeric items narrow rows, a few columns at most large number of rows (millions to a billion) Access via dimensions
2) Snowflakeschema: Represent dimensional hierarchy directly by normalizing tables. Easy to maintain and saves storage
Online analytical processing (OLAP): Online analytical processing, or OLAP is an approach to quickly answer multi-dimensional analytical queries. OLAP is part of the broader category of business intelligence, which also encompasses relational reporting and data mining. The typical applications of OLAP are in business reporting for sales, marketing, management reporting, business process management (BPM), budgeting and forecasting, financial reporting and similar areas
Page1
The output of an OLAP query is typically displayed in a matrix (or pivot) format. The dimensions form the rows and columns of the matrix; the measures form the values. MOLAP:
MOLAP stands for Multidimensional Online Analytical Processing.MOLAP is an alternative to the ROLAP (Relational OLAP) technology. While both ROLAP and MOLAP analytic tools are designed to allow analysis of data through the use of a multidimensional data model, MOLAP differs significantly in that it requires the pre-computation and storage of information in the cube - the operation known as processing. MOLAP stores this data in an optimized multi-dimensional array storage, rather than in a relational database. Advantages of MOLAP Fast query performance due to optimized storage, multidimensional indexing and caching. Smaller on-disk size of data compared to data stored in relational database due to compression techniques. Automated computation of higher level aggregates of the data. It is very compact for low dimension data sets. Array model provides natural indexing Effective data extract achieved through the pre-structuring of aggregated data.
ROLAP:
ROLAP stands for Relational Online Analytical Processing.ROLAP is an alternative to the MOLAP (Multidimensional OLAP) technology. While both ROLAP and MOLAP analytic tools are designed to allow analysis of data through the use of a multidimensional data model, ROLAP differs significantly in that it does not require the pre-computation and storage of information. Instead, ROLAP tools access the data in a relational database and generate SQL queries to calculate information at the appropriate level when an end user requests it. With ROLAP, it is possible to create additional database tables (summary tables or aggregations) which summarize the data at any desired combination of dimensions. While ROLAP uses a relational database source, generally the database must be carefully designed for ROLAP use. A database which was designed for OLTP will not function well as a ROLAP database. Therefore, ROLAP still involves creating an additional copy of the data. However, since it is a database, a variety of technologies can be used to populate the database. Advantages of ROLAP
Page1
ROLAP is considered to be more scalable in handling large data volumes, especially models with dimensions with very high cardinality (i.e. millions of members).
With a variety of data loading tools available, and the ability to fine tune the ETL code to the particular data model, load times are generally much shorter than with the automated MOLAP loads. The data is stored in a standard relational database and can be accessed by any SQL reporting tool (the tool does not have to be an OLAP tool). ROLAP tools are better at handling non-aggregatable facts (e.g. textual descriptions). MOLAP tools tend to suffer from slow performance when querying these elements. By decoupling the data storage from the multi-dimensional model, it is possible to successfully model data that would not otherwise fit into a strict dimensional model.
The ROLAP approach can leverage database authorization controls such as row-level security, whereby the query results are filtered depending on preset criteria applied, for example, to a given user or group of users (SQL WHERE clause comparision of OLTP systems and data warehousing system:
OLTP systems
Hold current data Stores detailed data Data is dynamic Repetitive processing High level of transaction throughput Predictable pattern of usage Transaction-driven Application-orented Supports day-to-day decisions Serves large number of clerical/operation users
Data warehousing systems

Holds historical data Stores detailed, lightly, and highly summarized data Data is largely static Ad hoc, unstructured, and heuristic processing Medium to how level of transaction throughput Unpredictable pattern of usage Analysis driven Subject-oriented supports strategic decisions Serves relatively how number of managerial users
Problems: Underestimation of resources for data loading Hidden problems with source systems Required data not captured Increased end-user demands
Page1
Data homogenization High demand for resources Data ownership High maintenance Long-duration projects Complexity of integration
Page1

Data Warehouse

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Warehouse

Uploaded by

Copyright:

Available Formats

Data Warehouse:

Decision support system: Definition:

Executive information system:

OLAP(online analytical processing) tools

Warehouse Manager Archive/backup data

End-user access tools

Typical architecture of a data warehouse

The critical steps in the construction of a data warehouse:

5) The benefits of data warehousing:

9) Data warehousing life cycle:

Schema Types o o o Star Schema Fact Constellation Schema Snowflake schema

Data warehousing systems

You might also like