Download as pdf or txt
Download as pdf or txt
You are on page 1of 75

Data Analysis and

Visualization
Stephen Paul G. Cajigas
Definition
the practice of working with data
to glean useful information, which
can then be used to make
informed decisions
Data Analysis

Source: https://www.coursera.org/articles/what-is-data-
analysis-with-examples
Steps
1. Identify the objectives
2. Gather data
3. Clean the data set
4. Perform analysis
Data Analysis 5. Interpret the results
Statistics
a science dealing with the collection,
analysis, interpretation, and presentation
of numerical data

Basic Statistics
Concepts
Population
a collection of persons, objects, or items of
interest

Census

Basic Statistics gather data from the whole population for


a given measurement of interest

Concepts
Sample
a portion of the whole

if properly taken, is representative of the


whole
Nominal Level

Ordinal Level

Levels of Data Interval Level

Measurement
Ratio Level
Nominal Level
- data may only be classified

Examples:
1. Sex
2. Residence

Levels of Data
Measurement
Ordinal Level
- data are ranked

Examples:
1. Likert scale
2. Educational attainment

Levels of Data 3. Socio-economic status

Measurement
Interval Level
- meaningful difference between values

Examples:
1. IQ
2. Date

Levels of Data
Measurement
Ratio Level
- meaningful 0 point and ratio between
values

Examples:
1. IQ

Levels of Data 2. Date

Measurement
Ratio Level
- meaningful 0 point and ratio between
values

Examples:
1. height

Levels of Data 2. weight


3. age

Measurement
Ratio and Interval level
data are classified as
continuous variables,
Levels of Data while ordinal and nominal
level data are classified as
Measurement categorical level data.
Measures of Central Tendency
1. Mean
2. Median
3. Mode

Measures of Position (Quantiles)


Numerical 1. Percentile
2. Decile

Measures 3. Quartile

Measures of Dispersion
1. Range
2. Variance/Standard Deviation
Arithmetic Mean
most commonly used measure
sum of all the data divided by the
number of entries

Weighted Mean
each entry has a corresponding
Measures of "weight"
the respective weights are multiplied to
Central its data and summed-up, then divided
by the total weight
Tendency Geometric Mean
the nth root of the product of all the
data
Median
the middle data
is not affected by extreme values, thus
is used when data set has outiers

Measures of Consider this data set


Age of respondents:
Central 23, 24, 25, 25, 26, 28, 62
Mean = 30.4
Tendency Median = 25
Mode
the most commonly appearing data
can have multiple modes, can have
none

Measures of
Central
Tendency
Moving Average
each data point of the series is
sequentially included in the
averaging, while the oldest data
point in the span of the average is
removed

Measures of
Central
Tendency
Moving Average
each data point of the series is
sequentially included in the
averaging, while the oldest data
point in the span of the average is
removed

Measures of
Central
Tendency
Quantiles
the data set is divided into equal-sized
subgroups

Percentile
the data set is divided into 100 equal-

Measures of sized subgroups

Decile
Position the data set is divided into 10 equal-
sized subgroups

Quartile
the data set is divided into 4 equal-
sized subgroups
Range
the difference between the largest and
the smallest value in the data set

Variance
the average of the squared deviations

Measures of aboutthe arithmetic mean for a set of


numbers

Dispersion Standard Deviation


the square root of the variance
Frequency
the number of times an event is
observed/ occured/ recorded.

Percentage
literally means "per hundred"

Other common can be calculated by dividing the


part by the whole
numerical Fraction
measures the ratio of the part to the whole
Fraction to percentage
Simply divide the numerator by
the denominator then nultiply by
100%
1/2 = 50%
2/3 = 67%
Conversion 14/58 = 24.14%

percentage Percentage to fraction


If the percentage is a whole
to/from number (no decimals), divide by
100%, then simplify
fraction 80% = 80/100 = 8/10 = 4/5
32% = 32/100 = 16/50 = 8/25
Percentage to fraction
If the percentage has decimals,
convert into a whole number by
multiplying 10, 100, etc., then
divide by (100% x the number
Conversion that was multiplied).
18.2% = 182/1000 = 91/500
percentage 63.24% = 6324/10,000 =
1581/2,500
to/from
fraction
Simplified Ratio
a common way of reporting a ratio
is by rounding off the denominator
into 5, 10, 20, 25 or other
multiples of 5

Other common 18.2% = 91/500 = 9/50


numerical 63.24% = 1581/2500 = 15/25 =
3/5
measures 233/542 = 42.99% = 43/100 =
4/10 = 2/5
Data
Visualization
graphical representation of information
and data

makes it easier to identify patterns,


trends and outliers in large data sets

Data
Visualization
Pie Graph

Common Bar Graph

Types of Histogram

Graphs Line Graph

Scatter Plot
a circular depiction of data where the
area of the whole pie represents 100%

Pie Graph of the data and slices of the pie


represent a percentage breakdown of
the sublevels.
Item 5 Item 1
20% 20%
When to use?
1. when you are trying to compare parts
of a whole
2. when data is one-dimensional

When NOT to use?


Item 4 Item 2
20% 20% 1. when there are many categories
2. data have only small differences

Item 3
20%
Considerations

Pie Graph 1. all categories must be included


2. using a 3D pie chart can be misleading

Item 5 Item 1
20% 20%

Item 4 Item 2
20% 20%

Item 3
20%
Item 4 Item 1
18.2% 18.2%

Item 2
9.1%

Item 3
54.5%
contains two or more categories along
one axis and a series of bars, one for
25
Bar Graphs each category, along the other axis
the length of the bar represents the
magnitude of the measure (amount,
20
frequency, money, percentage, etc.)
for each category
15

When to use?
10
1. when you are trying to compare
different categories
5 2. when data is two-dimensional

0
Item 1 Item 2 Item 3 Item 4 Item 5
When NOT to use?
1. when the categories are using different
25
Bar Graphs scales
2. when the categories are not
20
homogenous

15

10

0
Item 1 Item 2 Item 3 Item 4 Item 5
a series of contiguous bars or
rectangles that represent the frequency

Histogram of data in given class intervals.


each data interval is of the same width

5
When to use?
4
1. when you are trying to show
distribution
3

When NOT to use?


2 1. when categories are not the same sizes

0
8 10 12 14 16 18 20 22
uses picture symbols to illustrate
statistical information

Pictogram
When to use?
1. graphic illustration of part to whole
ratio

When NOT to use?


1. when divisor in ratio is too large
used to show information that changes
over time

Line Graph the horizontal axis is the time/period

40
Consideration
the time must be equally spaced

30

20

10
a two-dimensional graph plot of pairs
of points from two numerical variable

250 Scatter Plot When to use?


1. When you have a pair of numerical
200 data
2. When trying to find correlation
between the two categories
150

When NOT to use?


100 1. When data is already grouped

50

0
0 10 20 30
relates a dependent variable to one
or more independent (explanatory)
Regression variables
useful in predicting values

Linear Equation

y = mx + b
Integrity
Do not change the data
Ethical Do not mislead
truncated/extended axis
Considerations uneven width of bars
overlaying graphs
in
Visualizing
Data
100

75

50

25

0
2016 2017 2018 2019 2020
Data Story
Getting familiar with the data Instead of explaining what happened,
Identifying trends and outiers you’re more focused on how and why
"process of turning over 100 rocks to it happened and what should happen
find perhaps 1 or 2 precious next
gemstones" you have something specific you want
to show an audience

Exploratory Explanatory
Analysis Analysis
What is it?

Data Numbers and data need to be put into


context for us to understand it
Storytelling allow us to quickly and easily grasp
insights
help us remember those insights over
time
process of creating a story out of data
analysis findings
Elements of a data story
1. Relevant
2. Has good data
3. Good narrative
4. Intentional visuals
a. appropriate for the data
Data b. well-labeled
c. legible
Storytelling d. not misleading
Structure of a data story
1. Setup/Context
The "before" state of the data
2. Conflict/Challenge
How the data changes
Ask "what is causing the change?"
Data 3. Resolution
The "after" state that the change
Storytelling leads to

Distinguish each phase of your data-driven


story with separate images and descriptive
titles. Highlight only the important
information and leave everything else out
Steps in making a data story
1. Know your audience
2. Pinpoint the data that matters
3. Outline the story arc
4. Create a craft design
5. Assess your blind spots
Data 6. Polish up the visuals and share

Storytelling
Tips in making a good data story

Data 1. There is no specific order in reading a


graph
Storytelling 2. Our eyes focus on things that stand out
3. Our eyes can handle only a few things
at once
4. We try to find meaning in the data
5. We are guided by cultural convention
Workshop
More Examples
https://www.storytellingwithdata.com/blog/2018/3/9/bri
ng-on-the-bar-chartss — storytelling with data

You might also like