Data Warehousing

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 22

Data Warehousing

What is a data warehousing?

 A data warehouse: A single, complete and consistent store of data obtained from a variety
of different sources made available to end users in a what they can understand and use.
 Data Warehousing:Data warehousing is the process of constructing and using a data
warehouse
A producer wants to know……
Data, Data everywhere
yet ...
 I can’t find the data I need
• data is scattered over the network
• many versions, subtle differences
 I can’t understand the data I found
• available data poorly documented
 I can’t use the data I found
• results are unexpected
• data needs to be transformed from one form to other

Objective: Constructing and integrating systems used for storing, analyzing and reporting on data,
and is considered a core component of business intelligence
A single, complete and consistent store of data obtained from a variety of different sources made
available to end users in a what they can understand and use in a business context. [Barry Devlin]

Data warehousing is…..


Subject Oriented: Data that gives information about a particular subject instead of about a company's
ongoing operations.
Integrated: Data that is gathered into the data warehouse from a variety of
sources and merged into a coherent whole.
Time-variant: All data in the data warehouse is identified with a particular
time period.
Non-volatile: Data is stable in a data warehouse. More data is added but data
is never removed. This enables management to gain a consistent picture of the
business.
 Data warehousing is combining data from multiple and usually varied sources
into one comprehensive and easily manipulated database.
 Common accessing systems of data warehousing include queries, analysis
and reporting.
 Because data warehousing creates one database in the end, the number of
sources can be anything you want it to be, provided that the system can
handle the volume, of course.
 The final result, however, is homogeneous data, which can be more easily
manipulated.

A data warehousing becomes a technique for collecting and managing data from varied sources to
provide meaningful business insights. It is a blend of technologies and components which aids the
strategic use of data.
History of data warehousing
Evolution in organizational use of data warehouses
Organizations generally start off with relatively simple use of data
warehousing. Over time, more sophisticated use of data warehousing evolves.
The following general stages of use of the data warehouse can be
distinguished:
Off line Operational Database
Data warehouses in this initial stage are developed by simply copying the
data off an operational system to another server where the processing load
of reporting against the copied data does not impact the operational
system's performance.
Off line Data Warehouse
Data warehouses at this stage are updated from data in the operational
systems on a regular basis and the data warehouse data is stored in a data
structure designed to facilitate reporting.
Real Time Data Warehouse
Data warehouses at this stage are updated every time an operational
system performs a transaction (e.g. an order or a delivery or a booking.)
Integrated Data Warehouse
Data warehouses at this stage are updated every time an operational
system performs a transaction. The data warehouses then generate
transactions that are passed back into the operational systems.
Components of Data warehouse
Four components of Data Warehouses are:
Load manager: Load manager is also called the front component. It performs with all the operations
associated with the extraction and load of data into the warehouse. These operations include
transformations to prepare the data for entering into the Data warehouse.
Warehouse Manager: Warehouse manager performs operations associated with the management of
the data in the warehouse. It performs operations like analysis of data to ensure consistency, creation of
indexes and views, generation of denormalization and aggregations, transformation and merging of
source data and archiving and baking-up data.
Query Manager: Query manager is also known as backend component. It performs all the operation
operations related to the management of user queries. The operations of this Data warehouse
components are direct queries to the appropriate tables for scheduling the execution of queries.
End-user access tools: This is categorized into five different groups like 1. Data Reporting 2. Query
Tools 3. Application development tools 4. EIS tools, 5. OLAP tools and data mining tools.
Types of Data Warehouse
Three main types of Data Warehouses are:
1. Enterprise Data Warehouse:
Enterprise Data Warehouse is a centralized warehouse. It provides decision support service across the
enterprise. It offers a unified approach for organizing and representing data. It also provide the ability
to classify data according to the subject and give access according to those divisions.
2. Operational Data Store:
Operational Data Store, which is also called ODS, are nothing but data store required when neither
Data warehouse nor OLTP systems support organizations reporting needs. In ODS, Data warehouse is
refreshed in real time. Hence, it is widely preferred for routine activities like storing records of the
Employees.
3. Data Mart:
A data mart is a subset of the data warehouse. It specially designed for a particular line of business,
such as sales, finance, sales or finance. In an independent data mart, data can collect directly from
sources.
Why Do We Need Data Warehouses
Consolidation of information resources
Improved query performance
Separate research and decision support functions from the operational systems
Foundation for data mining, data visualization, advanced reporting and OLAP
tools

How is a data warehouse different from a regular database


Data warehouses use a different design from standard operational databases. The
latter are optimized to maintain strict accuracy of data in the moment by rapidly
updating real-time data. Data warehouses, by contrast, are designed to give a long-
range view of data over time. They trade off transaction volume and instead
specialize in data aggregation.
Database vs Data warehouse
Parameter Database Data Warehouse

Purpose Is designed to record Is designed to analyze

Processing Method The database uses the Online Transactional Processing (OLTP) Data warehouse uses Online Analytical Processing (OLAP).

Usage The database helps to perform fundamental operations for your business Data warehouse allows you to analyze your business.

Tables and Joins Tables and joins of a database are complex as they are normalized. Table and joins are simple in a data warehouse because they are denormalized.

Orientation Is an application-oriented collection of data It is a subject-oriented collection of data

Storage limit Generally limited to a single application Stores data from any number of applications

Availability Data is available real-time Data is refreshed from source systems as and when needed

Usage ER modeling techniques are used for designing. Data modeling techniques are used for designing.
Advantages of Data Warehouses
 It provides business users with a “customer-centric” view of the
company’s heterogeneous data by helping to integrate data from
sales, service, manufacturing and distribution, and other customer-related
business systems.
 It provides added value to the company’s customers by allowing them to
access better information when data warehousing is coupled with internet
technology.
 It consolidates data about individual customers and provides a repository
of all customer contacts for segmentation modeling, customer retention
planning, and cross sales analysis.
 It removes barriers among functional areas by offering a way to reconcile
views from multiple areas, thus providing a look at activities that cross
functional lines.
 It reports on trends across multidivisional, multinational operating
units, including trends or relationships in areas such as
merchandising, production planning etc.
Disadvantages of Data warehouses
 Data warehouses are not the optimal environment for
unstructured data.
 Because data must be extracted, transformed and loaded into the
warehouse, there is an element of latency in data warehouse
data.
 Over their life, data warehouses can have high costs.
Maintenance costs are high.
 Data warehouses can get outdated relatively quickly. There is a
cost of delivering suboptimal information to the organization.
 There is often a fine line between data warehouses and
operational systems. Duplicate, expensive functionality may be
developed. Or, functionality may be developed in the data
warehouse that, in retrospect, should have been developed in the
operational systems and vice versa
Examples of Data warehouse tools
OLAP
 Online Analytical Processing, a category of software tools which provide analysis of data for business decisions.
OLAP systems allow users to analyze database information from multiple database systems at one time. The
primary objective is data analysis and not data processing
 Any Data warehouse system is an OLAP system. Uses of OLAP are as follows;
A company might compare their mobile phone sales in September with sales in October, then compare those
results with another location which may be stored in a separate database eg Amazon analyzes purchases by its
customers to come up with a personalized homepage with products which likely interest to their customer.
Benefits of using OLAP services
1. OLAP creates a single platform for all type of business analytical needs which includes planning, budgeting,
forecasting, and analysis.
2. The main benefit of OLAP is the consistency of information and calculations.
3. Easily apply security restrictions on users and objects to comply with regulations and protect sensitive data.

Drawbacks of OLAP service


1. Implementation and maintenance are dependent on IT professional because the traditional OLAP tools require a
complicated modelling procedure.
2. OLAP tools need cooperation between people of various departments to be effective which might always be not
possible.
OLTP
 Online transaction processing shortly known as OLTP supports transaction-oriented applications in a 3-tier
architecture. OLTP administers day to day transaction of an organization. The primary objective is data processing
and not data analysis.
 Examples of OLTP system
1. Online banking
2. Online airline ticket booking
3. Sending a text message
4. Order entry
5. Add a book to shopping cart

Benefits of OLTP method


 It administers daily transactions of an organization.
 OLTP widens the customer base of an organization by simplifying individual processes.

Drawbacks of OLTP method


 If OLTP system faces hardware failures, then online transactions get severely affected.
 OLTP systems allow multiple users to access and change the same data at the same time
which many times created unprecedented situation.
OLAP vs OLTP

We can divide IT systems into transactional (OLTP) and analytical (OLAP). OLTP systems provide
source data to data warehouses, whereas OLAP systems help to analyze it.
Difference between OLTP and OLAP
OLTP OLAP
Process It is an online transactional system. It manages database OLAP is an online analysis and data retrieving process.
modification.
Characteristic It is characterized by large numbers of short online It is characterized by a large volume of data.
transactions.
Functionality An online database modifying system. An online database query management system.

Source OLTP and its transactions are the sources of data. Different OLTP databases become the source of data for OLAP.

Usefulness It helps to control and run fundamental business tasks. It helps with planning, problem-solving and decision support.

Queries Insert, Update, and Delete information from the Mostly select operations
database.
Operation Allow read/write operations. Only read and rarely write.

Purpose Designed for real time business Designed for analysis of business
operations. measures by category and attributes.

User type It is used by Data critical users like clerk, DBA & Data Used by Data knowledge users like workers, managers, and CEO.
Base professionals.
DATA CUBES
 A data cube is generally used to easily interpret data. It is especially useful when representing data
together with dimensions as certain measures of business requirements. A cube's every dimension
represents certain characteristic of the database, for example, daily, monthly or yearly sales. The data
included inside a data cube makes it possible to analyze almost all the figures for virtually any or all
customers, sales agents, products, and much more. Thus, a data cube can help to establish trends and
analyze performance. Data cubes are mainly categorized into two categories: 
 Multidimensional Data Cube: Most OLAP products are developed based on a structure where the cube
is patterned as a multidimensional array. These MOLAP products usually offers improved performance
when compared to other approaches because they can be indexed directly into the structure of the data
cube to gather subsets of data. When the number of dimensions is greater, the cube becomes sparser.
That means that several cells that represent particular attribute combinations will not contain any
aggregated data. This in turn boosts the storage requirements, which may reach undesirable levels at
times, making the MOLAP solution untenable for huge data sets with many dimensions. Compression
techniques might help; however, their use can damage the natural indexing of MOLAP. 
 Relational OLAP: Relational OLAP make use of the relational database model. The ROLAP data cube
is employed as a bunch of relational tables (approximately twice as many as the quantity of dimensions)
compared to a multidimensional array. Each one of these tables, known as a cuboid, signifies a specific
view.

You might also like