Professional Documents
Culture Documents
Foundations of Data Science - Unit 5
Foundations of Data Science - Unit 5
Foundations of Data Science - Unit 5
Data Science
Unit 5
Acknowledgement
▪ Most of the slides in this presentation are taken from material provided by
▪ A Reader on Data Visualization
(https://mschermann.github.io/data_viz_reader/)
▪ John Canny’s Presentation on Data Visualization in Berkeley University
Zarmeen
Spring 2021 2
Nasim
Data Visualization
Data Visualization
▪ Data visualization is the process of creating interactive visuals to understand
trends, variations, and derive meaningful insights from the data.
▪ The main goal of data visualization is to communicate information clearly and
effectively through graphical means.
Zarmeen
Spring 2021 4
Nasim
Uses of Data Visualization
▪ Identify relationships among attributes
▪ Detect outliers
▪ Discover structure
▪ Communication
Zarmeen
Spring 2021 5
Nasim
Type of Analysis and
Charts to Visualize
Trend Analysis
▪ It is an analysis of the rate of growth or decline (trend) between different periods
of time.
▪ Type of Charts for Trend Analysis
▪ Line Chart
▪ Sparklines:
▪ Scatter Plot
Zarmeen
Spring 2021 7
Nasim
Charts for Trend Analysis
Line Chart Sparkline Chart
Zarmeen
Spring 2021 8
Nasim
Scatter Plot
▪ A scatter plot is a two-dimensional data
visualization that uses dots to represent the
values obtained for two different variables - one
plotted along the x-axis and the other plotted
along the y-axis.
▪ Scatter plots are very useful tools for conveying
the relationship between two variables, but you
need to know how to use them and interpret
them properly.
Zarmeen
Spring 2021 9
Nasim
Correlation Analysis
▪ Identify relationships between attributes
▪ Most common type in Data analysis.
▪ For example:
▪ Does smoking cause cancer?
▪ Does the price of a petrol impact the price of gold?
Zarmeen
Spring 2021 10
Nasim
Scatter Plot for Correlation Analysis
Source:
https://mste.illinois.edu/courses/ci330ms/youtsey/scatterinfo.html
Zarmeen
Spring 2021 11
Nasim
Heatmap
▪ Used to visualize pair-wise correlation
matrix
▪ Change in intensity of color shows
degree of correlation between the
attributes
Source: https://towardsdatascience.com/data-visualization-in-data-science-5681cbdde5bf
Zarmeen
Spring 2021 12
Nasim
Distribution/Composition Analysis
▪ To understand distribution
of values in an attribute.
▪ Type of Charts for
Distribution Analysis Fig 1: Pie Chart
▪ Pie Chart
▪ Histogram
▪ Bar Chart
▪ Stacked Bar Chart
Zarmeen
Spring 2021 13
Nasim
Histogram Vs. Bar Charts
▪ Histograms are used to visualize values of numerical variable of interest
whereas Bar charts are used for visualizing distribution of categorical
variable
Zarmeen
Spring 2021 14
Nasim
Geographical Data
Analysis
Zarmeen
Spring 2021 15
Nasim
Quiz 1
S.No
0
Instructions for Online Participants 1
2
Example 1:
3
If your ERP ID is 14669 then you should pick 1st, 4th, 6th,
6th and 9th record. In addition, add 1 to each number in 4
your ID (separately, no carrying forward/backward). This 5
implies: 14669 → 25770 (1 is added to each number). As
6
a result, pick 2nd, 5th, 7th, 7th and 0th records too. The
combined dataset for this ERP ID would be [1, 4, 6, 6, 9, 7
2, 5, 7, 7, 0] 8
9
Example 2: Similarly, if your ERP ID is 19805 then 19805
→ 20916 The combined set or records of this ID would be 0
[1, 9, 8, 0, 5, 2, 0, 9, 1, 6].
Zarmeen
Spring 2021 17
Nasim