Professional Documents
Culture Documents
Topic 4 - Presenting and Describing Data V1
Topic 4 - Presenting and Describing Data V1
1
Learning Objectives
2
Introduction
Descriptive statistics
• Involves the arrangement, summary and presentation of data, to
enable meaningful interpretation and support decision making.
3
Introduction (cont)
4
2.1 Types of data
Variable
• A characteristic of a population or sample that is of
interest to us.
Examples
• cereal choice (brand names)
• capital expenditure
• waiting time for medical services.
5
2.1 Types of data (cont)
Data
• The actual (or observed) values of variables
–Numerical (quantitative) data are observations
taking real number values.
–Nominal (categorical) data are categorical
observations.
–Ordinal (ranked) data are ordered categorical
observations.
6
Types of data – examples
computer brand
weight gain
1 IBM
+10
2 Dell
+5
3 Compaq
4 IBM
With nominal data, all we can calculate is the proportion of data that falls
into each category.
7
Types of data – examples
Ordinal data
Exam grades
HD
D
C
P
F
8
Types of data – analysis
9
Cross-sectional & time-series
data
• Cross-sectional data is collected at a certain point in
time across a number of units of interest
–marketing survey (observe preferences by gender,
age)
–test score in a statistics course exam
–starting salaries of graduates of an MBA program in
a particular year
• Time-series data is collected over successive points in
time
–weekly closing price of gold
–monthly tourist arrivals in Australia
10
2.2 Graphical techniques for nominal data
11
Pie charts
• The pie chart is a very popular tool used to represent
the proportions of appearance for nominal data despite
its drawbacks.
Example 2.1
–To determine the approximate market share of
various women’s magazines, a women’s magazine
readership survey was conducted using a sample of
200 readers.
–Data was collected and the count of the occurrences
was recorded for each magazine.
–These counts were converted to proportions and the
results were presented in a pie chart.
12
Pie charts (cont)
• The pie chart is a circle, subdivided into a number of slices
that represent the various categories.
• The size of each slice is proportional to the percentage
corresponding to the category it represents.
Pie chart of
Women's magazine readership
New Idea
Wo man's
16% B etter (10/100)(3600) = 360
Ho mes
Day
10%
19%
That’ s Life
9%
Wo men's
Weekly
20% Other
26%
13
PIE CHART PRESENTATION
The presentation of pie-charts can be improved by observing
the following formatting guidelines:
Use appropriate titles: The title should tell the reader what
variable the chart is summarizing
Use appropriate labeling: Data labels should be used instead
of a legend
Use only one color: Using the same color for all sectors
emphasizes that the categories are part of the same series
Choose appropriate fonts: Use large or bold font for titles &
smaller for data labels to avoid clutter
Beware of explosions: Pulling the sectors long way apart
destroys the integrity of the pie & should be avoided
Start at midday: By default, Excel draws the first sector at
12 o’clock position
Source: Hunt, D.N. and Mashhoudy, H. (2008). The Humble Pie- Half-Baked or Well done?
Teaching Statistics, 30(1), 6-12
14
Bar charts
15
Example 2.1 (cont)
(Excel representation)
Bar chart of Women's magazine readership
60
52
50
40
Number of readers
38
40
32
30
20
18
20
10
0
Women's Woman's New Idea Better That’s Life Other
Weekly Day Homes
16
Bar charts (cont)
–Use bar charts also when the order in which data
are presented is meaningful.
Total number of air conditioners manufactured in a factory 1999–2004
20 000
15 000
10 000
5 000
0
1999 2000 2001 2002 2003 2004
17
2.3 Graphical techniques for numerical
data
Example 2.2
• Providing information concerning the monthly bills of new
subscribers in the first month after signing on with a
telephone company
–collect data
–prepare a frequency distribution
–draw a histogram
18
Example 2.2 (cont)
Largest Smallest
(There are 200 data points) observation observation
19
Example 2.2 (cont)
20
Example 2.2 (cont)
Bin Frequency
Draw a histogram
15 71
80 30 37
45 13
Frequency
60
40
60 9
75 10
20
90 18
0 105 28
15 30 45 60 75 90 105 120
120 14
Bills
21
Example 2.2 …
What information can we extract from this histogram?
60
40
20
0 105
120
45
15
30
60
75
90
Bills
22
Relative frequency
Class frequency
Class relative frequency =
Total number of observations
23
Relative frequency (cont)
24
Class width
25
Shapes of histograms
Symmetry
26
Shapes of histograms (cont)
Skewness
Negatively skewed
Positively skewed
27
Modal classes
A unimodal histogram
A bimodal histogram
30
Cumulative frequency
of a class
31
Ogives
• Ogives are cumulative relative frequency distributions.
}}
0-15 71 71 71/200=.355
15-30 37 108 108/200=.540
30-45 13 121 121/200=.605
45-60 9 130 130/200=.650
60-75 10 140 140/200=.700
75-90 18 158 158/200=.790
90-105 28 186 186/200=.930
105-200 14 200 200/200=1.000
32
Ogive
Ogive
120.00%
100.00%
80.00%
60.00%
Cumulative %
40.00%
20.00%
.00%
15 30 45 60 75 90 105 120
33
Example 2.5
34
Scatter diagram
• A scatter diagram can describe the relationship
between advertising expenditure and sales.
35
Scatter diagram
Advert Sales Sales Excel scatter diagram
1 30
3 40 60
50
5 40 40
Sales
4 50 30
20
2 35 10
5 50 0
3 35 0 1 2 3 4 5 6
2 25 Advertising Expenditure
36
Typical patterns
Positive linear relationship No relationship Negative linear relationship
37
2.5 Describing time-series data
• Time-series data are graphically depicted on a line
chart.
• A line chart of a time series plots the value of the
variable on the vertical axis and time periods on the
horizontal axis.
• A line chart is alternatively known as a time-series
chart.
38
Line chart
39
Line charts (cont)
20 000
15 000
10 000
Line charts are particularly useful when
the trend over time is to be emphasised.
5 000
0
1999 2000 2001 2002 2003 2004
40
Standards and benchmarks
• Many consider Charles Joseph Minard’s original time
series chart to be the best statistical graphic ever
drawn. Why?
• He took a two dimensional space and managed to
accurately depict five data variables:
–size of invading army
–size of retreating army
–geographic location
–Temperature; and
–time.
• The multivariate data is presented in such a way as to
provide an intriguing narrative as to the fate of
Napoleon’s army.
41
Graphical excellence
Edward Tufte of Yale describes graphical excellence
as…
• The well-designed presentation of interesting data – a
matter of substance, of statistics, and of design;
• that gives the viewer the greatest number of ideas in
the shortest time with the least ink in the smallest
space;
• which is nearly always multivariate; and
• which requires telling the truth about the data.
42
Graphical excellence (cont)
• Graphical excellence deals with the effective use of
graphical techniques.
• Effective graphical techniques are:
–informative
–concise
–and give a clear presentation of the data to the
viewer.
• How can we achieve graphical excellence?
43
Graphical excellence (cont)
44
Figure 3.1
• Graphical techniques should be used when there is a
large amount of data…
46
Figure 3.3
• This is a pie chart that contains only 4 numbers
48
Graphical deception
49
Is there a missing scale on
Are changes presented in absolute
one axis?
values only, or in percentage form too?
? (3%)
120.0
(2%)
110.0
(1%)
100.0
Time Time
Has any axis been stretched?
10%
Dollars
10%
Aug. 99 Sept. 99
1989 1994 1999
50
Written reports
• Here is one suggested method for structuring a
report that presents statistical information & analysis
to others:
1) Objective statement
2) Description of the experiment
3) Results
– Describe using words, tables, and charts.
4) Discussion of limitations
– Discuss problems with the analysis
– Include violations of required conditions,
assumptions, etc.
51
References
52