Professional Documents
Culture Documents
INSY662 - F23 - Week 2-1
INSY662 - F23 - Week 2-1
▪ Coding session
2
Exploratory Data Analysis
▪ Allows deeper understanding of the dataset
3
Exploratory Data Analysis
▪ Common steps
– Understand the list of columns, their data types,
whether they have missing values or not, etc.
– For categorical variables, check the count of
each category
– For numerical variables, check descriptive stats
– Check relationship among variables
– Check patterns of observations
4
Data Visualization Basics
▪ Histogram
– For univariate analysis of a continuous variable
– Capture the distribution, skewness, kurtosis
5
Data Visualization Basics
▪ Boxplots
– For both univariate and bivariate analyses
– Capture the distribution and skewness
– Show outliers
6
Data Visualization Basics
▪ Bar plot
– Similar function as a boxplot
7
Data Visualization Basics
▪ Scatterplots
– For bivariate analysis
– Show the relationship between variables
8
Data Visualization Basics
▪ Correlation matrix
9
Data Visualization Basics
▪ What if we want to see more than two variables?
– Use a color scheme
10
Data Visualization Basics
▪ What if we want to see more than two variables?
– Use different shapes
11
Data Visualization Pitfalls
▪ Data visualization is a powerful tool to provide
insights into the data
12
Data Visualization Pitfalls
▪ Manipulation of the baseline
– The natural baseline of bar charts should be zero.
– Truncated graph, which uses a different baseline,
can lead to incorrect interpretation.
13
Data Visualization Pitfalls
▪ Manipulation of the y-axis
– The y-axis represents the magnitude of the plot
– Stretching or squeezing the y-axis can lead to the
misperception with the variability of the data
– Especially salient for graphs that show changes
over time
14
Data Visualization Pitfalls
▪ Manipulation of the timeframe
– In line graphs that intend to show the time trend,
the range of the x-axis can control the narrative
– Should understand what trend the graph intends
to show (e.g., short or long) and compare that
with the narrative
15
Data Visualization Pitfalls
▪ Manipulation of the graph type
– Different types of graphs have different applications
(e.g., time trend, emphasizing differences)
– Using a “wrong” type of graphs can mislead the
perception of the readers
16
Data Visualization Pitfalls
▪ Use of different standard
– Human tends to have “common sense” of how the
data should be visualized
– For example, the color that represents positivity
vs. negativity, the shade that represents the
density, etc.
– Going against this “common sense” can mislead
the audience
17
Let’s do some coding!
▪ Please download automobile.csv and Week2-1.py
from MyCourses.
18