Foundations of Data Science - Unit 5

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Foundations of

Data Science
Unit 5
Acknowledgement
▪ Most of the slides in this presentation are taken from material provided by
▪ A Reader on Data Visualization
(https://mschermann.github.io/data_viz_reader/)
▪ John Canny’s Presentation on Data Visualization in Berkeley University

Zarmeen
Spring 2021 2
Nasim
Data Visualization
Data Visualization
▪ Data visualization is the process of creating interactive visuals to understand
trends, variations, and derive meaningful insights from the data.
▪ The main goal of data visualization is to communicate information clearly and
effectively through graphical means.

Zarmeen
Spring 2021 4
Nasim
Uses of Data Visualization
▪ Identify relationships among attributes
▪ Detect outliers
▪ Discover structure
▪ Communication

Zarmeen
Spring 2021 5
Nasim
Type of Analysis and
Charts to Visualize
Trend Analysis
▪ It is an analysis of the rate of growth or decline (trend) between different periods
of time.
▪ Type of Charts for Trend Analysis
▪ Line Chart
▪ Sparklines:
▪ Scatter Plot

Zarmeen
Spring 2021 7
Nasim
Charts for Trend Analysis
Line Chart Sparkline Chart

Zarmeen
Spring 2021 8
Nasim
Scatter Plot
▪ A scatter plot is a two-dimensional data
visualization that uses dots to represent the
values obtained for two different variables - one
plotted along the x-axis and the other plotted
along the y-axis.
▪ Scatter plots are very useful tools for conveying
the relationship between two variables, but you
need to know how to use them and interpret
them properly.

Zarmeen
Spring 2021 9
Nasim
Correlation Analysis
▪ Identify relationships between attributes
▪ Most common type in Data analysis.
▪ For example:
▪ Does smoking cause cancer?
▪ Does the price of a petrol impact the price of gold?

▪ Type of Charts for Correlation Analysis


▪ Scatter Plot
▪ Heatmaps

Zarmeen
Spring 2021 10
Nasim
Scatter Plot for Correlation Analysis

Source:
https://mste.illinois.edu/courses/ci330ms/youtsey/scatterinfo.html

Zarmeen
Spring 2021 11
Nasim
Heatmap
▪ Used to visualize pair-wise correlation
matrix
▪ Change in intensity of color shows
degree of correlation between the
attributes

Source: https://towardsdatascience.com/data-visualization-in-data-science-5681cbdde5bf

Zarmeen
Spring 2021 12
Nasim
Distribution/Composition Analysis
▪ To understand distribution
of values in an attribute.
▪ Type of Charts for
Distribution Analysis Fig 1: Pie Chart

▪ Pie Chart
▪ Histogram
▪ Bar Chart
▪ Stacked Bar Chart

Fig 3: Stacked Bar Chart Fig 2: Bar Chart

Zarmeen
Spring 2021 13
Nasim
Histogram Vs. Bar Charts
▪ Histograms are used to visualize values of numerical variable of interest
whereas Bar charts are used for visualizing distribution of categorical
variable

Zarmeen
Spring 2021 14
Nasim
Geographical Data
Analysis

Zarmeen
Spring 2021 15
Nasim
Quiz 1
S.No
0
Instructions for Online Participants 1
2
Example 1:
3
If your ERP ID is 14669 then you should pick 1st, 4th, 6th,
6th and 9th record. In addition, add 1 to each number in 4
your ID (separately, no carrying forward/backward). This 5
implies: 14669 → 25770 (1 is added to each number). As
6
a result, pick 2nd, 5th, 7th, 7th and 0th records too. The
combined dataset for this ERP ID would be [1, 4, 6, 6, 9, 7
2, 5, 7, 7, 0] 8
9
Example 2: Similarly, if your ERP ID is 19805 then 19805
→ 20916 The combined set or records of this ID would be 0
[1, 9, 8, 0, 5, 2, 0, 9, 1, 6].

Zarmeen
Spring 2021 17
Nasim

You might also like