Unit IV - Data Warehousing and OLAP Technologies

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 68

Course Outcomes

• CO1: Understand relational and object-oriented databases.


• CO2: Learn and understand of parallel &
distributed database architectures
• CO3: Learn the concepts of NoSQL Databases.
• CO4: Understand data warehouse and OLAP technologies.
• CO5: Apply data mining algorithms and to learn
various software tools.
• CO6: Learn emerging and enhanced data models
for advanced applications.
UNIT IV
• Architectures and components of data warehouse,
Characteristics and limitations of data warehouse, Data
warehouse schema (Star, Snowflake), OLAP
Architecture (ROLAP/MOLAP/HOLAP), Introduction
to decision support system, Views and Decision
support
At the end of the Unit II,

• CO4: Understand data warehouse and OLAP technologies


Introduction
Introduction
Introduction
• Two types of Data
• Eg.Dmart
• Operational Data (Day to Day)

• Strategic Data/Information (DW)


Introduction

• A Data Warehouse is a subject oriented, integrated,


nonvolatile, and time variant collection of data in
support of management’s decisions.
Characteristics of DWH

• A Data Warehouse is a subject oriented,


• In thedatawarehouse, data is not stored by
operational applications, but by business subjects.
Characteristics of DWH
• A Data Warehouse is integrated,
• Data inconsistencies are removed; data from
diverse operational applications is integrated
Characteristics of DWH
• A Data Warehouse is nonvolatile
• Usually the data in the data warehouse is not updated or deleted.
• The data in a data warehouse is not as volatile as the data in an
operational database is.
• The data in a data warehouse is primarily for query and analysis.
Characteristics of DWH

• A Data Warehouse is Time Variant


• Allows for analysis of the past
• Relates information to the present
• Enables forecasts for the future
Advantages of DWH

• Enhance performance
• Uniformity
• Potential high returns on investments
• Secure Information
• Competative advantages
• Store historic data
Disadvantages of DWH

• Complexity of integration.
• Time consuming process
• Changing requirements of End user.
• High investments on initial set up.
• High maintenance cost.
Components of DWH
• Source Data Component

i)Production Data

ii)Internal Data

iiiArchived Data

iv)External Data

• Data Staging Component


• Data Storage Component
Information Delivery Component

• Metadata Component
• Management and Control Component
Architecture of DWH
Three major areas
DWH and Data Mart

• Data mart is a logical subset of the complete DWH , a sort


of pie-wedge of whole DWH.
• Analytical data stores designed to focus on specific
business functions. Means subject oriented
• Collection of all data marts in a particular area form
integrated DWH
Data Warehouse Data Marrt

Corporate/Enterprise Departmental wise


widedata data

Aggregation of all A single business


dmarts process

Data received from Data received from


staging area facts and dimensions

Queries on Data access and


Presentation resource analysis is simple.

Structure to suit for Structure to suit for


corporate view of data departmental view of
data
DWH Design Strategy

• 1) Top down Approach

• 2) Bottom up Approach
Advantages

• i) More corporate effort is reqed to build huge DWH


• ii) Single, central storage of data about subjective content
having centralized rules and control.

Disadvantages
i) More time consuming method
ii) High risk of failure if less no of exp professionalshaving cross
functional skills
Advantages

• i) Faster implementation of manageable dmarts


• ii) Less risk of failure

Disadvantages
i) Each dmart has its own limited amount of data
ii)Possibility of redundant data,inconsistent data irreconcilable data in
every dmart.
Practical Approach for
designing DWH

Plan and define requirements of DWH

Create surrounding Architecture

Standardization of Data

Implement DWH
Dimensional Modeling

• Design concept used by many DWH designers to build


• DM is a logical design tech.
• Every DM is composed of one table with a multipart key
(many foreign keys) called fact table-contains
facts/measurements of business,set of smaller tables
Dimension tablescontains context measurements i.e.
dimensions on which the facts are calculated.
• Each DT has a single-part PK corresponds to multi-part key
in FT
Designing Dimensional Model


• • Identifying business process
• Identifying levels of details needed(Grain)
• Identifying the dimensions
• Choosing the facts
• Choosing the duration of DB
Introduction

• Operational DBs — OLTP


• DataWareHousing(DWH)— OLAP
Architecture

• The structure that brings all the components of a data


warehouse together is known as the architecture

• E.g. College Building


Architecture and Components of DWH
Components of DWH

• Mainly five components of DWH


• Data Warehouse Database
• Typical Relational DBS
• Analytical DBS
• Data warehouse Appliances
• Cloud hosted DBS
• ETL Tools (Extraction, Transformation, Load)
• Metadata: Data about data which defines the
DWH
• Data Warehouse Access Tools
• Data Mart
Components of DWH

• Load Manager
• Warehouse Manager
• Query manager
Data Warehouse

• DW is a copy of transaction data specifically


structured for querying and reporting

• DW includes business intelligence tools, tools to


extract, transform, and load into the repository,
and tools to manage and retrieve metadata.
Data Warehouse Schema

• A schema is a collection of database


objects which include tables, views, index etc.

• Two Types:
• Star Schema
• Snowflake Schema
Data Warehouse Schema_Star Schema

• It consists single fact table but has many


dimension tables.
• Looks like Star (fact table in and Dimension
tables out)
• Each Dimension table will have P.K, And Fact
table will contain P.K of all D tables as F.K
• It is also known as star join schema
Data Warehouse Schema_Star Schema

• Fact Tables are normalized


• Dimension Tables are not normalized
• Advantages
• Simplest and easiest
• Optimizes navigation through db
• Most suitable for query processing
Data Warehouse Schema_Snowflake Schema

• It is variation of star schema, which has multiple level


of dimension tables.
• Dimension tables are normalized.
Data Warehouse Schema_Snowflake Schema
• Fact Tables are normalized
• Dimension Tables are also normalized
• Advantages
• Less Redundancies
• DTs are easy to update
• Disadvantage
• Complex Schema (Complex Queries)
• No.of join opeartions are increased
OLAP (Online Analytical Processing)

• Online analytical processing (OLAP) or Information


System and online transaction processing (OLTP) or
Operational System are two different data processing
systems designed for different purposes.
• OLAP is optimized for complex data analysis and
reporting.
• OLTP is optimized for transactional processing and
real-time updates.
Difference between OLTP & OLAP
OLAP (Online Analytical
Processing)

• OLAP (online analytical processing) is a


computing method that enables users to easily
and selectively extract and query data in order to
analyze it from different points of view.
OLAP (Online Analytical
Processing)
• OLAP analytical processing) is
(online an answer
approach to multi-dimensional
• OLAP is part of the broader category of business
intelligence , which also encompasses relational
dbs, report writing and data mining.

OLAP (Online Analytical
Processing)
• Applications of OLAP include
•business reporting for sales,
•marketing,
•management reporting,
•business process management (BPM),
•budgeting,
•financial reporting and similar areas, with
new applications emerging.
OLAP (Online Analytical
Processing)
• At the core of the OLAP is an OLAP cube (also called a
'multidimensional cube’ or ‘hyper cube’)
• It consists of numeric facts called measures that are
categorized by dimensions.
• Each measure can be thought of as having a set of labels,
or meta-data associated with it. A dimension is what
describes these labels; it provides information about
the measure.
• A simple example would be a cube that contains a store's
sales as a measure, and Date/Time as a dimension. Each
Sale has a Date/Time label that describes more about that
sale
OLAP (Online Analytical
Processing)
Timeid Sale
amount
123 5000

Timeid timestamp
123 20200813
11:35:32
OLAP
• Since OLAP contains multidimensional
data usually obtained from different sources, it
requires a special method of storing that data.
• Using a spreadsheet with rows and columns is
good for two-dimensional data, but not for
multidimensional data.
• Instead, OLAP cubes should be used for that
purpose.
• OLAP data is typically stored in a star schema or
snowflake schema in a relational data warehouse
OLAP
OLAP
Types of OLAP

• ROLAP (Relational)
• MOLAP(Multidimensional)
• HOLAP(Hybrid)
Types of OLAP
• ROLAP (Relational) (Features + Advantages)
•ROLAP works directly with relational databases
•The base data and the dimension tables are stored as
relational tables and new tables are created to hold the
aggregated information.
•It depends on a specialized schema design.
•It is more scalable
•ROLAP tools analyze large volume of data
•The data are stored in a standard relational DBs
and can be accessed by any SQL reporting tool
Types of OLAP

• Disadvantages of ROLAP (Relational)


•ROLAP tools have slower performance
•Poor query performance (large volume pre-processing is difficult
to implement efficiently so it is frequently skipped.)
•Since ROLAP tools rely on SQL for all of the
computations, they are not suitable when the model is
heavy on calculations which don't translate well
into SQL.
Types of OLAP

• MOLAP (Multidimensional) (Features + Advantages)


•Classic form of OLAP sometimes referred as just
OLAP
•MOLAP stores data in an optimized multi-
dimensional array storage, rather than in a relational
database.
•Can perform complex computation.
•Information Retrieval is fast, have fast response to
query.
Types of OLAP

• HOLAP (Hybrid) (ROLAP + MOLAP)


•Combines advantages of both
•HOLAP tools can utilize both pre-calculated
cubes and relational data sources.
OLAP System
Architecture
OLAP System
Architecture
OLAP
Operations
• Four Types analytical OLAP operations
are:
• Roll-up
• Drill-down
• Slice and dice
• Pivot (rotate)
OLAP
Operations

• Four Types analytical OLAP operations


are:
• Roll-up
• It is also known as Aggregation.
OLAP Operations: Roll-
up
OLAP Operations: Drill
Down

• Four Types analytical OLAP operations


are:
• Drill-down
• Data is fragmented into smaller parts.
• Opposite of the rollup process
OLAP Operations: Drill
Down
OLAP
Operations
• Four Types analytical OLAP operations are:
• Slice
• One dimension is selected, and a new sub-cube is
created.
• Dice
• This operation is similar to a slice. The difference in
dice is you select 2 or more dimensions that result in
the creation of a sub-cube.
OLAP Operations:
Slice
OLAP Operations:
Dice
Introduction to decision support system

• Any system that provides clarity about data can be


considered a DSS.
• E.g. Google map
• In healthcare, a doctor might use a computerized DSS
to diagnose and prescribe medication.
• DSS can be used to produce easy-to-
understand
reports about key data trends based on user
specifications, to build predictive models and

visualizations.
Introduction to decision support system

• DSS are a specific class of computerized


information system that supports business and
organizational decision making activities.
Introduction to decision support system
• A system for decision making and problem solving.
• DSS Components can be classified as:
• Inputs: Factors, numbers, characteristics to analyze
• User Knowledge and expertise: Manual analysis by
the users.
• Outputs: Transformed data from which DSS
decisions are generated.
• Decisions: Results generated by the DSS based on
user criteria.
Requirements of decision support system

• Data from multiple sources


• Data formatting and collation
• A suitable database location and format built
for decision support –based reporting and analysis.
• Robust tools and applications to report , monitor and
analyze the data.
Applications of decision support system

• Medical diagnosis
• Business and management
• Agriculture Production
• Forest Management
Summary of decision support system
THANK YOU…

You might also like