Professional Documents
Culture Documents
Data Analysis and Visualization 2
Data Analysis and Visualization 2
Visualization
Stephen Paul G. Cajigas
Definition
the practice of working with data
to glean useful information, which
can then be used to make
informed decisions
Data Analysis
Source: https://www.coursera.org/articles/what-is-data-
analysis-with-examples
Steps
1. Identify the objectives
2. Gather data
3. Clean the data set
4. Perform analysis
Data Analysis 5. Interpret the results
Statistics
a science dealing with the collection,
analysis, interpretation, and presentation
of numerical data
Basic Statistics
Concepts
Population
a collection of persons, objects, or items of
interest
Census
Concepts
Sample
a portion of the whole
Ordinal Level
Measurement
Ratio Level
Nominal Level
- data may only be classified
Examples:
1. Sex
2. Residence
Levels of Data
Measurement
Ordinal Level
- data are ranked
Examples:
1. Likert scale
2. Educational attainment
Measurement
Interval Level
- meaningful difference between values
Examples:
1. IQ
2. Date
Levels of Data
Measurement
Ratio Level
- meaningful 0 point and ratio between
values
Examples:
1. IQ
Measurement
Ratio Level
- meaningful 0 point and ratio between
values
Examples:
1. height
Measurement
Ratio and Interval level
data are classified as
continuous variables,
Levels of Data while ordinal and nominal
level data are classified as
Measurement categorical level data.
Measures of Central Tendency
1. Mean
2. Median
3. Mode
Measures 3. Quartile
Measures of Dispersion
1. Range
2. Variance/Standard Deviation
Arithmetic Mean
most commonly used measure
sum of all the data divided by the
number of entries
Weighted Mean
each entry has a corresponding
Measures of "weight"
the respective weights are multiplied to
Central its data and summed-up, then divided
by the total weight
Tendency Geometric Mean
the nth root of the product of all the
data
Median
the middle data
is not affected by extreme values, thus
is used when data set has outiers
Measures of
Central
Tendency
Moving Average
each data point of the series is
sequentially included in the
averaging, while the oldest data
point in the span of the average is
removed
Measures of
Central
Tendency
Moving Average
each data point of the series is
sequentially included in the
averaging, while the oldest data
point in the span of the average is
removed
Measures of
Central
Tendency
Quantiles
the data set is divided into equal-sized
subgroups
Percentile
the data set is divided into 100 equal-
Decile
Position the data set is divided into 10 equal-
sized subgroups
Quartile
the data set is divided into 4 equal-
sized subgroups
Range
the difference between the largest and
the smallest value in the data set
Variance
the average of the squared deviations
Percentage
literally means "per hundred"
Data
Visualization
Pie Graph
Types of Histogram
Scatter Plot
a circular depiction of data where the
area of the whole pie represents 100%
Item 3
20%
Considerations
Item 5 Item 1
20% 20%
Item 4 Item 2
20% 20%
Item 3
20%
Item 4 Item 1
18.2% 18.2%
Item 2
9.1%
Item 3
54.5%
contains two or more categories along
one axis and a series of bars, one for
25
Bar Graphs each category, along the other axis
the length of the bar represents the
magnitude of the measure (amount,
20
frequency, money, percentage, etc.)
for each category
15
When to use?
10
1. when you are trying to compare
different categories
5 2. when data is two-dimensional
0
Item 1 Item 2 Item 3 Item 4 Item 5
When NOT to use?
1. when the categories are using different
25
Bar Graphs scales
2. when the categories are not
20
homogenous
15
10
0
Item 1 Item 2 Item 3 Item 4 Item 5
a series of contiguous bars or
rectangles that represent the frequency
5
When to use?
4
1. when you are trying to show
distribution
3
0
8 10 12 14 16 18 20 22
uses picture symbols to illustrate
statistical information
Pictogram
When to use?
1. graphic illustration of part to whole
ratio
40
Consideration
the time must be equally spaced
30
20
10
a two-dimensional graph plot of pairs
of points from two numerical variable
50
0
0 10 20 30
relates a dependent variable to one
or more independent (explanatory)
Regression variables
useful in predicting values
Linear Equation
y = mx + b
Integrity
Do not change the data
Ethical Do not mislead
truncated/extended axis
Considerations uneven width of bars
overlaying graphs
in
Visualizing
Data
100
75
50
25
0
2016 2017 2018 2019 2020
Data Story
Getting familiar with the data Instead of explaining what happened,
Identifying trends and outiers you’re more focused on how and why
"process of turning over 100 rocks to it happened and what should happen
find perhaps 1 or 2 precious next
gemstones" you have something specific you want
to show an audience
Exploratory Explanatory
Analysis Analysis
What is it?
Storytelling
Tips in making a good data story