Download as pdf or txt
Download as pdf or txt
You are on page 1of 102

TRƯỜNG CÔNG NGHỆ VÀ THIẾT KẾ

KHOA CÔNG NGHỆ THÔNG TIN KINH DOANH

Chapter 02

NATURE OF DATA,
STATISTICAL MODELLING,
AND VISUALIZATION

Giảng viên: ThS. Phạm Thị Thanh Tâm


Email: tamptt@ueh.edu.vn
AGENDA

• THE NATURE OF DATA


1

• A SIMPLE TAXANOMY OF DATA


2
• THE ART AND SCIENCE OF DATA
3 PREPROCESSING
• STATISTICAL MODELLING FOR BUSINESS
4 ANALYTICS
• DESCRIPTIVE STATISTICS FOR DESCRIPTIVE
5 ANALYTICS

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 2


AGENDA
• REGRESSION MODELLING FOR INFERENTIAL
6 STATISTICS
• BUSINESS REPORTING DEFINITIONS AND
7 CONCEPTS

• DATA VISUALIZATION
8

• WHAT CHART OR GRAPH SHOULD YOU USE?


9
• THE EMERGENCE OF VISUAL ANALYTICS
10

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 3


1. THE NATURE OF DATA

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 4


1. The Nature of Data

 Data: a collection of facts


 usually obtained as the result of experiences,
observations, or experiments
 Data may consist of numbers, words, images, …
 Data is the lowest level of abstraction (from
which information and knowledge are derived)
 Data is the source for information and knowledge
 Data quality and data integrity  critical to
analytics
15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 5
1. The Nature of Data

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 6


1. The Nature of Data

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 7


1. The Nature of Data

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 8


1. The Nature of Data

 Metrics for Analytics Ready Data


 Data source reliability
 Data content accuracy
 Data accessibility
 Data security and data privacy
 Data richness

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 9


1. The Nature of Data

 Data consistency
 Data currency/data timeliness
 Data granularity
 Data validity
 Data relevancy

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 10


2. A SIMPLE TAXANOMY OF DATA

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 11


2. A Simple Taxanomy
of Data
 Data (datum—singular form of data): facts
 Structured data
 Targeted for computers to process
 Unstructured/textual data
 Targeted for humans to process/digest
 Semi-structured data?
 XML, HTML, Log files, etc.

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 12


2. A Simple Taxanomy
of Data

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 13


2. A Simple Taxanomy
of Data
 Categorical data:
 Nominal data: gender (male female),
marital status (single, married, divorced)
 Orinal data: credit score (low, medium,
high), age group (chid, young, middle-age,
elderly), education level (high school,
college, graduate school), …

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 14


2. A Simple Taxanomy
of Data
 Numerical
 Interval: variables can be measured on
interval scales (score, grade, income,
temperature, salary, …)
 Ratio: mass, length, time, energy, …

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 15


2. A Simple Taxanomy
of Data

 Nominal: gender, race, lunch, test preparation course


 Ordinal: level of education
 Numerical: math_score, reading_score, writing_score
15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 16
Application Case 2.1

Medical Device Company Ensures Product


Quality While Saving Money
Questions for Discussion
1. What were the main challenges for the medical
device company? Were they market or
technology driven?
2. What was the proposed solution?
3. What were the results? What do you think was
the real return on investment (ROI)

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 17


3. THE ART AND SCIENCE OF
DATA PREPROCESSING

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 18


3. The Art of Science of
Data Preprocessing
 The real-world data is dirty, misaligned,
overly complex, and inaccurate
 Not ready for analytics!
 Readying the data for analytics is needed
 Data preprocessing (next slide)

 Art – it develops and improves with


experience

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 19


3. The Art of Science of
Data Preprocessing

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 20


3. The Art of Science of
Data Preprocessing

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 21


3. The Art of Science of
Data Preprocessing

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 22


Application Case 2.2
Improving Student Retention with Data-Driven
Analytics
Questions for Discussion
1. What is student attrition, and why is it an important
problem in higher education?
2. What were the traditional methods to deal with the
attrition
3. problem?
4. List and discuss the data-related challenges within
context of this case study.
5. What was the proposed solution? And, what were the
results?
15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 23
Application Case 2.2

 Student retention
 Freshmen class
 Why it is important?
 What are the common techniques to deal
with student attrition?
 Analytics vs theoretical approaches to student
retention problem

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 24


Application Case 2.2

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 25


Application Case 2.2

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 26


Application Case 2.2

 Data imbalance problem

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 27


Application Case 2.2

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 28


Application Case 2.2

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 29


 Results

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 30


4. STATISTICAL MODELLING FOR
BUSINESS ANALYTICS

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 31


15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 32
4. Statistical Modelling for
Business Analytics
 Statistics: A collection of mathematical
techniques to characterize and interpret data
 Descriptive Statistics
 Describing the data (as it is)
 Inferential Statistics
 Drawing inferences about the population
based on sample data

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 33


5. DESCRIPTIVE STATISTICS FOR
DESCRIPTIVE ANALYTICS

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 34


5. Descriptive Atatistics for
Descriptive Analytics
 Measures of Centrality Tendency
 Mean: The average value of observation

 Median: The measure of center value


 Mode: The most frequent observation

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 35


5. Descriptive Atatistics for
Descriptive Analytics
STT Salary_Bank 1 Salary_Bank 2
1 35 17
2 15 10
3 5 22
4 8 14
5 15 15
6 18 19
7 15 15
8 28 20
9 5 12
10 15 10
15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 36
5. Descriptive Atatistics for
Descriptive Analytics
 Measures of Dispersion: Degree of
variation in a given variable
 Range: Min – Max
 Variance Standard Deviation

 Mean Absolute Deviation (MAD): Average


absolute deviation from the mean

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 37


5. Descriptive Atatistics for
Descriptive Analytics
Salary_Bank 1
40

35

30

25

20

15

10

0
1 2 3 4 5 6 7 8 9 10

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 38


5. Descriptive Atatistics for
Descriptive Analytics

 Quartiles
 Box-and-Whiskers Plot
 a.k.a. box-plot
 Versatile/informative
 Can show variation
within data set

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 39


5. Descriptive Atatistics for
Descriptive Analytics
 Histogram – frequency chart
 Skewness – Measure of asymmetry

 Kurtosis – Peak/tall/skinny nature of the


distribution

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 40


5. Descriptive Atatistics for
Descriptive Analytics
 Relationship Between Dispersion and Shape Properties

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 41


5. Descriptive Atatistics for
Descriptive Analytics

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 42


4. Statistical Modelling for
Business Analytics

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 43


Technology Insights 2.1

 Descriptive Statistics in Excel

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 44


5. Descriptive Atatistics for
Descriptive Analytics

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 45


5. Descriptive Atatistics for
Descriptive Analytics
 Descriptive Statistics in Excel Creating box-
plot in Microsoft Excel

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 46


Application Case 2.3

Town of Cary Uses Analytics to Analyze Data from


Sensors, Assess Demand, and Detect Problems
Questions for Discussion
1. What were the challenges the Town of Cary was
facing?
2. What was the proposed solution?
3. What were the results?
4. What other problems and data analytics
solutions do you foresee for towns like Cary
15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 47
6. REGRESSION MODELLING FOR
INFERENTIAL STATISTICS

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 48


6. Regreesion Modelling for
Inferential Statistic
 Regression
 A part of inferential statistics
 The most widely known and used analytics
technique in statistics
 Used to characterize relationship between
Xplanatory (input) and Response (output)
variable
 It can be used for
 Hypothesis testing (explanation)
 Forecasting (prediction)

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 49


6. Regreesion Modelling for
Inferential Statistic
 Regression Modeling
 Correlation vs Regression
 What is the difference (or relationship)?
 Simple Regression vs Multiple Regression
 Base on number of input variables

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 50


6. Regreesion Modelling for
Inferential Statistic
 How do we develop linear regression
models?
 Scatter plots (visualization—for simple
regression)
 Ordinary least squares method
• A line that minimizes squared of the
errors

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 51


6. Regreesion Modelling for
Inferential Statistic

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 52


6. Regreesion Modelling for
Inferential Statistic
 x: input, y: output
 Simple Linear Regression
 Multiple Linear Regression

 The meaning of Beta (β) coefficients


 Sign (+ or -) and magnitude

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 53


6. Regreesion Modelling for
Inferential Statistic
 Process of Developing a Regression Model

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 54


6. Regreesion Modelling for
Inferential Statistic
 How do we know if the model is good
enough?
 R2 (R-Square)
 p Values
 Error measures (for prediction problems)
 MSE, MAD, RMSE

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 55


6. Regreesion Modelling for
Inferential Statistic
 Regression Modeling Assumptions
 Linearity
 Independence
 Normality (Normal Distribution)
 Constant Variance
 Multicollinearity
 What happens if the assumptions do NOT hold?
 What do we do then?

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 56


Example: Linear Regression

 R-square = 0.98

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 57


6. Regreesion Modelling for
Inferential Statistic
 Logistic Regression Modeling
 A very popular statistics-based classification
algorithm
 Employs supervised learning
 Developed in 1940s
 The difference between Linear Regression and
Logistic Regression
 In Logistic Regression Output/Target
variable is a binomial (binary classification)
variable (as opposed to numeric variable)
15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 58
6. Regreesion Modelling for
Inferential Statistic

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 59


Application Case 2.4

 Predicting NCAA Bowl Game Outcomes

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 60


Application Case 2.4
 The analytics process
to develop prediction
models (both
regression and
classification type) for
NCAA Bowl Game
outcomes

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 61


5. Regreesion Modelling for
Inferential Statistic
 Prediction Results
 Classification
 Regression

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 62


5. Regreesion Modelling for
Inferential Statistic

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 63


6. Regreesion Modelling for
Inferential Statistic

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 64


Application Case 2.4

Questions for Discussion


1. What are the foreseeable challenges in predicting
sporting event outcomes (e.g., college bowl
games)?
2. How did the researchers formulate/design the
prediction problem (i.e., what were the inputs
and output, and what was the representation of
a single sample—row of data)?
3. How successful were the prediction results? What
else can they do to improve the accuracy?

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 65


6. Regreesion Modelling for
Inferential Statistic
 Time Series Forecasting
 Is it different than Simple Linear
Regression? How?

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 66


6. Regreesion Modelling for
Inferential Statistic

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 67


7. BUSINESS REPORTING
DEFINITIONS AND CONCEPTS

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 68


7. Business Reporting
Definition and Concepts
 Report = Information → Decision
 Report?
 – Any communication artifact prepared to convey
specific information
 A report can fulfill many functions
 To ensure proper departmental functioning
 To provide information
 To provide the results of an analysis
 To persuade others to act
 To create an organizational memory…

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 69


7. Business Reporting
Definition and Concepts
 What is a Business Report?
 A written document that contains
information regarding business matters.
 Purpose: to improve managerial decisions
 Source: data from inside and outside the
organization (via the use of ETL)
 Format: text + tables + graphs/charts
 Distribution: in-print, email,
portal/intranet
15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 70
7. Business Reporting
Definition and Concepts

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 71


7. Business Reporting
Definition and Concepts
 Types of Business Reports
 Metric Management Reports
 Help manage business performance through metrics
(SLAs for externals; KPIs for internals)
 Can be used as part of Six Sigma and/or TQM
 Dashboard-Type Reports
 Graphical presentation of several performance
indicators in a single page using dials/gauges
 Balanced Scorecard–Type Reports
 Include financial, customer, business process, and
learning & growth indicators

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 72


Application Case 2.5

Flood of Paper Ends at FEMA (Federal Emergency


Management Agency)
Questions for Discussion
1. What does FEMA do?
 help people before,during and after disasters.
2. What are the main challenges that FEMA faces?
3. How did FEMA improve its inefficient reporting
practices?
 WebFOCUS solution
15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 73
8. DATA VISUALIZATION

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 74


8. Data Visualization

“The use of visual representations to explore, make


sense of, and communicate data.”
 Data visualization vs. Information visualization
 Information = aggregation, summarization, and
contextualization of data
 Related to information graphics, scientific
visualization, and statistical graphics
 Often includes charts, graphs, illustrations, …

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 75


8. Data Visualization

 A Brief History of Data Visualization


 Data visualization can date back to the
second century AD
 Most developments have occurred in the
last two and a half centuries
 Until recently it was not recognized as a
discipline
 Today’s most popular visual forms date
back a few centuries
15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 76
8. Data Visualization

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 77


8. Data Visualization

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 78


Application Case 2.6

 Macfarlan Smith Improves Operational


Performance Insight with Tableau Online

Questions for Discussion


1. What were the data and reporting related
challenges Macfarlan Smith facing?
2. What was the solution and the obtained results
and/or benefits?
15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 79
9. WHICH CHART OR GRAPH
SHOULD YOU USE?

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 80


9. Which Chart or Graph
Should You Use?
 Basic Charts and Graphs
 Line Chart
 Bar Chart
 Pie Chart
 Scatter Plot
 Bubble Chart

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 81


9. Which Chart or Graph
Should You Use?
 Specialized Charts and Graphs
 Histogram Gantt Chart
 Pert Chart Geographic Map
 Bullet Heat Map
 Highlight Table Tree Map

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 82


9. Which Chart or Graph
Should You Use?

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 83


An Example Gapminder Chart
Wealth and Health of Nations

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 84


10. THE EMERGENCE OF
VISUAL ANALYTICS

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 85


9. The Emergence of
Visual Analytics
 Magic Quadrant for Business Intelligence and
Analytics Platforms (Source: Gartner.com)
 Many data visualization companies are in the
4th quadrant
 There is a move towards visualization

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 86


15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 87
9. The Emergence of
Visual Analytics
 Emergence of new companies
 Tableau, Spotfire, QlikView, …
 Increased focus by the big players
 MicroStrategy improved Visual Insight
 SAP launched Visual Intelligence
 SAS launched Visual Analytics
 Microsoft bolstered PowerPivot with Power
View
 IBM launched Cognos Insight
 Oracle acquired Endeca
15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 88
9. The Emergence of
Visual Analytics
 Visual Analytics
 A recently coined term
 Information visualization + predictive analytics
 Information visualization
 Descriptive, backward focused
 “what happened” “what is happening”
 Predictive analytics
 Predictive, future focused
 “what will happen” “why will it happen”
 There is a strong move toward visual analytics

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 89


Technologoy Insight 2.3

 Telling Great Stories with Data and Visualization

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 90


9. The Emergence of
Visual Analytics
 Visual Analytics by SAS Institute

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 91


9. The Emergence of
Visual Analytics

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 92


9. The Emergence of
Visual Analytics

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 93


Application Case 2.7

Dallas Cowboys Score Big with Tableau and


Teknion
Questions for Discussion
1. How did the Dallas Cowboys use information
visualization?
2. What were the challenge, the proposed
solution, and the obtained results?

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 94


10. PERFORMANCE DASHBOARD

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 95


10. Performance Dashboard

 Performance dashboards are commonly used


in BPM software suites and BI platforms
 Dashboards provide visual displays of
important information that is consolidated
and arranged on a single screen so that
information can be digested at a single glance
and easily drilled in and further explored

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 96


15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 97
10. Performance Dashboard

 Dashboard design
 The fundamental challenge of dashboard
design is to display all the required
information on a single screen, clearly and
without distraction, in a manner that can be
assimilated quickly
 Three layer of information
 Monitoring
 Analysis
 Management

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 98


10. Performance Dashboard
 What to look for in a dashboard
 Use of visual components to highlight data and
exceptions that require action
 Transparent to the user, meaning that they require
minimal training and are extremely easy to use
 Combine data from a variety of systems into a single,
summarized, unified view of the business
 Enable drill-down or drill-through to underlying data
sources or reports
 Present a dynamic, real-world view with timely data
 Require little coding to implement, deploy, and
maintain

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 99


10. Performance Dashboard

 Best Practices in Dashboard Design


 Benchmark KPIs with Industry Standards
 Wrap the Metrics with Contextual Metadata
 Validate the Design by a Usability Specialist
 Prioritize and Rank Alerts and Exceptions
 Enrich Dashboard with Business-User
Comments
 Present Information in Three Different Levels
 Pick the Right Visual Constructs
 Provide for Guided Analy
15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 100
Application Case 2.8

Visual Analytics Helps Energy Supplier Make Better


Connections
Questions for Discussion
1. Why do you think energy supply companies are
among the prime users of information
visualization tools?
2. How did Electrabel use information visualization
for the single version of the truth?
3. What were their challenges, the proposed
solution, and the obtained results?
15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 101
Q&A

15/08/2023 Môn học: Hệ hỗ trợ quản trị thông minh 102

You might also like