CH 04 PPTaccessible

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 62

Business Intelligence, Analytics, Data

Science, and AI
Fifth Edition

Chapter 4
Descriptive Analytics II: Roman numeral two colon

Business Intelligence, Data


Warehousing, and
Visualization

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Learning Objectives (1 of 2)
4.1 Understand the basic definitions and concepts of data
warehousing
4.2 Understand data warehousing architectures
4.3 Describe the processes used in developing and
managing data warehouses
4.4 Explain data warehousing operations
4.5 Explain the role of data warehouses in decision support
4.6 Explain data integration and the extraction,
transformation, and load (ETL) processes

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Learning Objectives (2 of 2)
4.7 Understand the importance of data warehouse
administration, security issues, and future trends
4.8 Define business reporting, and understand its historical
evolution
4.9 Understand the importance of data visualization
4.10 Learn different types of visualization techniques
4.11 Appreciate the value that visual analytics brings to
business analytics
4.12 Know the capabilities and limitations of dashboards

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Opening Vignette
Targeting Tax Fraud with Business Intelligence and Data
Warehousing
1. Why is it important for IRS and for U.S. state governments to
use data warehousing and business intelligence (BI) tools in
managing state revenues?
2. What were the challenges the state of Maryland was facing
with regard to tax fraud?
3. What was the solution they adopted? Do you agree with their
approach? Why?
4. What were the results that they obtained? Did the investment
in B I and data warehousing pay off?
5. What other problems and challenges do you think federal and
state governments are having that can benefit from BI and
data warehousing?

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Business Intelligence and Data
Warehousing
• BI used to be everything
related to use of data for
managerial decision
support
• Now, it is a part of
Business Analytics
– BI = Descriptive
Analytics

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
What is a Data Warehouse?
• A physical repository where relational data are specially
organized to provide enterprise-wide, cleansed data in a
standardized format
• A relational database? (so what is the difference?)
• “The data warehouse is a collection of integrated,
subject-oriented databases designed to support DSS
functions, where each unit of data is non-volatile and
relevant to some moment in time”

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
A Historical Perspective to Data
Warehousing

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Analytics In Action 4.1
Data-Driven Customer Experience in Financial Services
• Introduction
• Challenges
• Solutions
• Results

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Characteristics of DWs
• Subject oriented
• Integrated
• Time-variant (time series)
• Nonvolatile
• Summarized
• Not normalized
• Metadata
• Web based, relational/multi-dimensional
• Client/server, real-time/right-time/active …

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Data Mart
A departmental small-scale “DW” that stores only
limited/relevant data
• Dependent data mart
– A subset that is created directly from a data warehouse
• Independent data mart
– A small data warehouse designed for a strategic
business unit or a department

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Other DW Components
• Operational data stores (ODS)
– A type of database often used as an interim area for a
data warehouse
• Oper marts
– An operational data mart
• Enterprise data warehouse (EDW)
– A data warehouse for the enterprise
• Metadata – “data about data”
– In DW metadata describe the contents of a data
warehouse and its acquisition and use

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
DW for Data-Driven Decision Making
• An example of a DW supporting data-driven decision
making in automotive industry

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
A Generic DW Framework

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Data Warehousing Architectures
• Three-tier architecture
1. Data acquisition software (back-end)
2. The data warehouse that contains the data & software
3. Client (front-end) software that allows users to access
and analyze data from the warehouse
• Two-tier architecture
– First two tiers in three-tier architecture are combined
into one
… sometimes there is only one tier?

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
DW Architectures
3-tier
architecture

2-tier 1-tier
architecture Architecture?

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Data Warehousing Architectures
• Issues to consider when deciding which architecture to
use:
– Which database management system (DBMS) should
be used?
– Will parallel processing and/or partitioning be used?
– Will data migration tools be used to load the data
warehouse?
– What tools will be used to support data retrieval and
analysis?

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
A Web-based DW Architecture

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Alternative DW Architectures (1 of 2)

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Alternative DW Architectures (2 of 2)

• Each architecture has advantages and disadvantages!


• Which architecture is the best?

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Ten Factors that Potentially Affect the
Architecture Selection Decision
1. Information interdependence between organizational units
2. Upper management’s information needs
3. Urgency of need for a data warehouse
4. Nature of end-user tasks
5. Constraints on resources
6. Strategic view of the data warehouse prior to implementation
7. Compatibility with existing systems
8. Perceived ability of the in-house IT staff
9. Technical issues
10. Social/political factors

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Data Management and Warehouse
Development
• A data warehousing project is a major undertaking
• DW provides both direct and indirect benefits
• Data warehouse development approaches
– Inmon Model: EDW approach (top-down)
– Kimball Model: Data mart approach (bottom-up)
– Which model is best?
• Table 4.3 provides a comparative analysis between EDW and
Data Mart approach
• Another alternative is the hosted data warehouses

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Comparing EDW and Data Mart (1 of 2)
Table 3.3 Contrasts between the DM and EDW Development
Approaches
Effort D M Approach E D W Approach
Scope One subject area Several subject areas
Development time Months Years
Development cost $10,000 to $100,000+ $1,000,000+
Development Low to medium High
difficulty
Data prerequisite for Common (within business area) Common (across enterprise)
sharing
Sources Only some operational and Many operational and external
external systems systems
Size Megabytes to several gigabytes Gigabytes to petabytes
Time horizon Near-current and historical data Historical data
Data transformations Low to medium High

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Comparing EDW and Data Mart (2 of 2)
Table 3.3 Contrasts between the DM and EDW Development
Approaches (Continued)
Effort D M Approach E D W Approach
Update frequency Hourly, daily, weekly Weekly, monthly
Technology
Hardware Workstations and departmental Enterprise servers and mainframe
Servers computers
Operating system Windows and Linux Unix, Z/O S, O S/390
Databases Workgroup or standard database Enterprise database servers
servers
Usage
Number of 10s 100s to 1,000s
simultaneous Users
User types Business area analysts and Enterprise analysts and senior
Managers executives
Business spotlight Optimizing activities within the Cross-functional optimization and
business area decision making

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Additional DW Considerations Hosted
Data Warehouses
• Benefits:
– Requires minimal investment in infrastructure
– Frees up capacity on in-house systems
– Frees up cash flow
– Makes powerful solutions affordable
– Enables solutions that provide for growth
– Offers better quality equipment and software
– Provides faster connections
– …
more in the book

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Representation of Data in DW
• Dimensional Modeling
– A retrieval-based system that supports high-volume
query access
• Star schema
– The most commonly used and the simplest style of
dimensional modeling
– Contain a fact table surrounded by and connected to
several dimension tables
• Snowflakes schema
– An extension of star schema where the diagram
resembles a snowflake in shape

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Multidimensionality
The ability to organize, present, and analyze data by several
dimensions, such as sales by region, by product, by
salesperson, and by time (four dimensions)
• Multidimensional presentation
– Dimensions: products, salespeople, market
segments, business units, geographical locations,
distribution channels, country, or industry
– Measures: money, sales volume, head count,
inventory profit, actual versus forecast
– Time: daily, weekly, monthly, quarterly, or yearly

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Star Schema versus Snowflake
Schema
Figure 4.9 (a) The Star Schema, and (b) the Snowflake
Schema

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Analysis of Data in DW
• OLTP v s. OLAP
ersu

• OLTP (Online Transaction Processing)
– Capturing and storing data from ERP, CRM, POS, …
– The main focus is on efficiency of routine tasks
• OLAP (Online Analytical Processing)
– Converting data into information for decision support
– Data cubes, drill-down/rollup, slice & dice, …
– Requesting ad hoc reports
– Conducting statistical and other analyses
– Developing multimedia-based applications
– … more in the book

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
OLAP v s. OLTP ersu

Table 4.5 A Comparison between OLTP and OLAP

Criteria OLTP OLAP


Purpose To carry out day-to-day business To support decision making and
functions provide answers to business and
management queries
Data source Transaction database (a Data warehouse or D M (a
normalized data repository primarily nonnormalized data repository
focused on efficiency and primarily focused on accuracy and
consistency) completeness)
Reporting Routine, periodic, narrowly focused Ad hoc, multidimensional, broadly
Reports focused reports and queries
Resource requirements Ordinary relational databases Multiprocessor, large-capacity,
specialized databases
Execution speed Fast (recording of business Slow (resource intensive, complex,
transactions and routine reports) large-scale queries)

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
OLAP Operations
• Slice - a subset of a multidimensional array
• Dice - a slice on more than two dimensions
• Drill Down/Up - navigating among levels of data ranging
from the most summarized (up) to the most detailed
(down)
• Roll Up - computing all of the data relationships for one or
more dimensions
• Pivot - used to change the dimensional orientation of a
report or an ad hoc query-page display

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
OLAP
Figure 4.10 Slicing
Operations on a Simple
Three-Dimensional Data
Cube

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Analytics In Action 4.2
AARP Transforms Its BI Infrastructure and Achieves a
347% ROI in Three Years

Question for Discussion:


1. What were the challenges AARP was facing?
2. What was the approach for a potential solution?
3. What were the results obtained in the short term, and
what were the future plans?

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Data Integration and the Extraction,
Transformation, and Load Process (1 of 2)
• ETL = Extract Transform Load
• Data integration
– Integration that comprises three major processes: data
access, data federation, and change capture.
• Enterprise application integration (EAI)
– A technology that provides a vehicle for pushing data
from source systems into a data warehouse
• Enterprise information integration (EII)
– An evolving tool space that promises real-time data
integration from a variety of sources, such as relational
or multidimensional databases, Web services, etc.

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Data Integration and the Extraction,
Transformation, and Load Process (2 of 2)

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
ETL (Extract, Transform, Load)
• Issues affecting the purchase of an ETL tool
– Data transformation tools are expensive
– Data transformation tools may have a long learning
curve
• Important criteria in selecting an ETL tool
– Ability to read from and write to an unlimited number of
data sources/architectures
– Automatic capturing and delivery of metadata
– A history of conforming to open standards
– An easy-to-use interface for the developer and the
functional user

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
DW Administration and Security
• Data warehouse administrator (DWA)
– DWA should …
▪ have the knowledge of high-performance software,
hardware, and networking technologies
▪ possess solid business knowledge and insight
▪ be familiar with the decision-making processes so as to
suitably design/maintain the data warehouse structure
▪ possess excellent communications skills
• Security and privacy is a pressing issue in DW
– Safeguarding the most valuable assets
– Government regulations (HIPAA, etc.)
– Must be explicitly planned and executed

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
The Future of DW
• Sourcing …
– Web, social media, and Big Data
– Open-source software
– SaaS (software as a service)
– Cloud computing
– Data lakes
• Infrastructure …
– Columnar
– Real-time DW
– Data warehouse appliances
– Data management practices/technologies
– In-database & In-memory processing New DBMS
– New DBMS, Advanced analytics, …
Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Technology Insights 4.2
Data Lakes
• Unstructured data storage technology for Big Data

Table 4.6 A Simple Comparison between a Data Warehouse and a Data


Lake
Dimension Data Warehouse Data Lake
The nature of data Structured, processed Any data in raw/native format
Processing Schema-on-write (S Q L) Schema-on-read (NoS Q L)
Retrieval speed Very fast Slow

Cost Expensive for large data volumes Designed for low-cost storage
Agility Less agile, fixed configuration Highly agile, flexible configuration

Novelty/newness Not new/matured Very new/maturing

Security Well-secured Not yet well-secured

Users Business professionals Data scientists

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Business Reporting Definitions and
Concepts
• Report = Information  Decision
• Report?
– Any communication artifact prepared to convey
specific information
• A report can fulfill many functions
– To ensure proper departmental functioning
– To provide information
– To provide the results of an analysis
– To persuade others to act
– To create an organizational memory …
Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
What is a Business Report?
• A written document that contains information regarding
business matters.
• Purpose: to improve managerial decisions
• Source: data from inside and outside the organization (via
the use of ETL)
• Format: text + tables + graphs/charts
• Distribution: in-print, email, portal/intranet
Data acquisition  Information generation  Decision
making  Process management

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Business Reporting

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Types of Business Reports
• Metric Management Reports
– Help manage business performance through metrics
(SLAs for externals; KPIs for internals)
– Can be used as part of Six Sigma and/or TQM
• Dashboard-Type Reports
– Graphical presentation of several performance
indicators in a single page using dials/gauges
• Balanced Scorecard–Type Reports
– Include financial, customer, business process, and
learning & growth indicators

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Data Visualization
“The use of visual representations to explore, make sense of,
and communicate data.”
• Data visualization v s. Information visualization
ersu

• Information = aggregation, summarization, and


contextualization of data
• Related to information graphics, scientific visualization,
and statistical graphics
• Often includes charts, graphs, illustrations, …

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
A Brief History of Data Visualization
• Data visualization can date back to the second century AD
• Most developments have occurred in the last two and a
half centuries
• Until recently it was not recognized as a discipline
• Today’s most popular visual forms date back a few
centuries

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
The First Line/Area Chart Created by
William Playfair in 1801
William Playfair is widely credited as the inventor of the
modern chart, having created the first line and pie charts.

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Decimation of Napoleon’s Army
During the 1812 Russian Campaign

By Charles Joseph Minard


• Arguably the most popular multi-dimensional illustration!
Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Different Types of Charts and Graphs
• Basic Charts and Graphs
– Line chart
– Bar chart
– Pie chart
▪ Donut chart
– Scatter plot
▪ Bubble chart
• Specialized charts
– Histogram, Gantt chart, PERT chart, Geographic charts,
Buller chart, Hear map, Highlight table, Tree map, …

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Which Chart or Graph Should You
Use?

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
An Example Gapminder Chart Wealth
and Health of Nations

• See gapminder.org for Interesting animated visualizations.


Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
The Emergence of Data Visualization
and Visual Analytics
• Magic Quadrant for
Business Intelligence
and Analytics Platforms
(Source: Gartner.com )
• Many data visualization
companies represented
• There is a continuous
move towards visualization
in analytics

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Technology Insights 4.4
• Telling Great Stories with Data and Visualization
• Why story?
• What Is a Good Story?
• Story telling best practices:
– Think of your analysis as a story—use a story
structure.
– Be authentic—your story will flow.
– Be visual—think of yourself as a film editor.
– Make it easy for your audience and you.
– Invite and direct discussion.

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Visual Analytics
• A recently coined term
– Information visualization + predictive analytics
• Information visualization
– Descriptive, backward focused
– “what happened” “what is happening”
• Predictive analytics
– Predictive, future focused
– “what will happen” “why will it happen”
• There is a strong move toward visual analytics

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Visual Analytics by SAS Institute (1 of 2)

• SAS Visual Analytics Architecture


– Big data + In memory + Massively parallel processing + …

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Visual Analytics by SAS Institute (2 of 2)
• SAS Viya for Education – Analytics on the Cloud

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Information Dashboards
• Information dashboards are commonly used in Business
Process Management (BPM) software suites and BI
platforms
• Dashboards provide visual displays of important
information that is consolidated and arranged on a single
screen so that information can be digested at a single
glance and easily drilled in and further explored

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Performance Dashboards (1 of 3)

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Performance Dashboards (2 of 3)
• Dashboard design
– The fundamental challenge of dashboard design is to
display all the required information on a single screen,
clearly and without distraction, in a manner that can be
assimilated quickly
• Three layer of information
– Monitoring
– Analysis
– Management

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Performance Dashboards (3 of 3)
• What to look for in a dashboard
– Use of visual components to highlight data and
exceptions that require action
– Transparent to the user, meaning that they require
minimal training and are extremely easy to use
– Combine data from a variety of systems into a single,
summarized, unified view of the business
– Enable drill-down or drill-through to underlying data
sources or reports
– Present a dynamic, real-world view with timely data
– Require little coding to implement, deploy, and
maintain

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Best Practices in Dashboard Design
• Benchmark KPIs with Industry Standards
• Wrap the Metrics with Contextual Metadata
• Validate the Design by a Usability Specialist
• Prioritize and Rank Alerts and Exceptions
• Enrich Dashboard with Business-User Comments
• Present Information in Three Different Levels
• Pick the Right Visual Constructs
• Provide for Guided Analytics

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Analytics In Action 4.3
Increasing the Efficiency of Social Media Campaign
Reporting to Get to Insights Quicker
• Motivation
• Implementation
– Blending technologies for an automated, bespoke, and
branded reporting solution
• Results
– Automatic data processing and increased reporting
frequency
• Why KNIME?

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
End of Chapter 4
• Questions/Comments

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Copyright

This work is protected by United States copyright laws and is


provided solely for the use of instructors in teaching their courses
and assessing student learning. Dissemination or sale of any part
of this work (including on the World Wide Web) will destroy the
integrity of the work and is not permitted. The work and materials
from it should never be made available to students except by
instructors using the accompanying text in their classes. All
recipients of this work are expected to abide by these restrictions
and to honor the intended pedagogical purposes and the needs of
other instructors who rely on these materials.

Copyright © 2024, 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

You might also like