Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 52

Presenting and Describing data

1
Learning Objectives

• Recognise the difference between grouped and ungrouped data


• Construct a frequency distribution
• Construct a histogram, a frequency polygon, an ogive, a pie chart,
a Pareto chart, and a scatter plot.
• Distinguish between measures of central tendency, measures of
variability, and measures of shape.
• Understand conceptually the meanings of mean, median, mode,
quartile, percentile, and range.
• Differentiate between sample and population variance and
standard deviation.
• Understand box and whisker plots, skewness.
• Compute a coefficient of correlation and interpret it

2
Introduction

Descriptive statistics
• Involves the arrangement, summary and presentation of data, to
enable meaningful interpretation and support decision making.

• Descriptive statistics methods make use of:


–graphical descriptive methods

–numerical descriptive measures.

3
Introduction (cont)

• The methods presented apply to:


–the entire population, and
–a sample selected from the population.

4
2.1 Types of data

Variable
• A characteristic of a population or sample that is of
interest to us.

Examples
• cereal choice (brand names)
• capital expenditure
• waiting time for medical services.

5
2.1 Types of data (cont)

Data
• The actual (or observed) values of variables
–Numerical (quantitative) data are observations
taking real number values.
–Nominal (categorical) data are categorical
observations.
–Ordinal (ranked) data are ordered categorical
observations.

6
Types of data – examples

Numerical data Nominal data


person married
age income 1 yes
55 75 000 2 no
42 68 000 3 no

computer brand
weight gain
1 IBM
+10
2 Dell
+5
3 Compaq
4 IBM

With nominal data, all we can calculate is the proportion of data that falls
into each category.

7
Types of data – examples

Ordinal data

Exam grades
HD
D
C
P
F

8
Types of data – analysis

• Knowing the type of data is necessary to properly


select the technique to be used.
• Type of analysis allowed for each type of data:
–Numerical data: arithmetic calculations
–Nominal data: counting the number of observations
in each category and calculating their proportions
–Ordinal data: computations based on an ordering
process

9
Cross-sectional & time-series
data
• Cross-sectional data is collected at a certain point in
time across a number of units of interest
–marketing survey (observe preferences by gender,
age)
–test score in a statistics course exam
–starting salaries of graduates of an MBA program in
a particular year
• Time-series data is collected over successive points in
time
–weekly closing price of gold
–monthly tourist arrivals in Australia
10
2.2 Graphical techniques for nominal data

• The graphical presentations shown here are used


primarily for nominal data.

• These graphical tools are most appropriate when the


raw data can be naturally categorised in a meaningful
manner.

11
Pie charts
• The pie chart is a very popular tool used to represent
the proportions of appearance for nominal data despite
its drawbacks.
Example 2.1
–To determine the approximate market share of
various women’s magazines, a women’s magazine
readership survey was conducted using a sample of
200 readers.
–Data was collected and the count of the occurrences
was recorded for each magazine.
–These counts were converted to proportions and the
results were presented in a pie chart.

12
Pie charts (cont)
• The pie chart is a circle, subdivided into a number of slices
that represent the various categories.
• The size of each slice is proportional to the percentage
corresponding to the category it represents.
Pie chart of
Women's magazine readership

New Idea

Wo man's
16% B etter (10/100)(3600) = 360
Ho mes
Day
10%
19%
That’ s Life
9%
Wo men's
Weekly
20% Other
26%

13
PIE CHART PRESENTATION
The presentation of pie-charts can be improved by observing
the following formatting guidelines:

 Use appropriate titles: The title should tell the reader what
variable the chart is summarizing
 Use appropriate labeling: Data labels should be used instead
of a legend
 Use only one color: Using the same color for all sectors
emphasizes that the categories are part of the same series
 Choose appropriate fonts: Use large or bold font for titles &
smaller for data labels to avoid clutter
 Beware of explosions: Pulling the sectors long way apart
destroys the integrity of the pie & should be avoided
 Start at midday: By default, Excel draws the first sector at
12 o’clock position
Source: Hunt, D.N. and Mashhoudy, H. (2008). The Humble Pie- Half-Baked or Well done?
Teaching Statistics, 30(1), 6-12
14
Bar charts

• Bar charts are an alternative to pie charts.

• The frequency (or relative frequency) of each category


is represented by a
vertical bar.

15
Example 2.1 (cont)
(Excel representation)
Bar chart of Women's magazine readership
60
52
50
40
Number of readers

38
40
32
30
20
18
20

10

0
Women's Woman's New Idea Better That’s Life Other
Weekly Day Homes

16
Bar charts (cont)
–Use bar charts also when the order in which data
are presented is meaningful.
Total number of air conditioners manufactured in a factory 1999–2004

20 000

15 000

10 000

5 000

0
1999 2000 2001 2002 2003 2004

17
2.3 Graphical techniques for numerical
data
Example 2.2
• Providing information concerning the monthly bills of new
subscribers in the first month after signing on with a
telephone company
–collect data
–prepare a frequency distribution
–draw a histogram

18
Example 2.2 (cont)

Collect data Prepare a frequency distribution


Bills How many classes to use?
42.19 Number of observations Number of classes
38.45 Less then 50 5–7
95.73 50–200 7–9
104.80 200–500 9–10
22.57 500–1 000 10–11
92.97 1 000–5 000 11–13
88.62 5 000–50 000 13–17
115.50 More than 50 000 17–20
119.63 class width = [range] / [# of classes]
.
. [119.63 – 0] / [8] = 14.95 15

Largest Smallest
(There are 200 data points) observation observation
19
Example 2.2 (cont)

Class limit Frequency


0 up to 15 71
15 up to 30 37
30 up to 45 13
45 up to 60 9
60 up to 75 10
75 up to 90 18
90 up to 105 28
105 up to120 14

20
Example 2.2 (cont)

Bin Frequency
Draw a histogram
15 71
80 30 37
45 13
Frequency

60
40
60 9
75 10
20
90 18
0 105 28
15 30 45 60 75 90 105 120
120 14
Bills

21
Example 2.2 …
What information can we extract from this histogram?

About half of all A few bills are in A relatively


the bills are small the middle range large number
of large bills
80 71+37=108 13+9+10=32
18+28+14=60
Frequency

60

40

20

0 105
120
45
15
30

60
75
90

Bills
22
Relative frequency

• It is often preferable to show the relative frequency


(proportion) of observations falling into each class,
rather than the frequency itself.

Class frequency
Class relative frequency =
Total number of observations

23
Relative frequency (cont)

• Relative frequencies should be used when:


–the population relative frequencies are studied
–comparing two or more histograms
–the number of observations of the samples studied
are different.

24
Class width

• It is generally best to use equal class widths, but


sometimes unequal class widths are called for.
• Unequal class widths are used when the frequency
associated with some classes is too low. Then,
–several classes are combined together to form a
wider and ‘more populated’ class
–it is possible to form an open-ended class at the
higher or lower end of the histogram.

25
Shapes of histograms

• There are four typical shape characteristics

Symmetry

26
Shapes of histograms (cont)

Skewness

Negatively skewed

Positively skewed

27
Modal classes

• A modal class is the one with the largest number of


observations.

A unimodal histogram

The modal class


28
Modal classes (cont)

• A modal class is the one with the largest number of


observations.

A bimodal histogram

A modal class A modal class


29
Bell shape
• Many statistical techniques require that the population
be bell shaped.
–Drawing the histogram helps verify the shape of the
population in question.

30
Cumulative frequency
of a class

• This is the number of measurements less than the


upper limit of that class.
• To obtain the cumulative frequency of a class, we add
the frequency of that class and the frequencies of all
previous classes.
• The cumulative relative frequency of a particular class
is the proportion of measurements that are less than
the upper limit of that class.

31
Ogives
• Ogives are cumulative relative frequency distributions.

Example 2.2 cont.


Cumulative relative frequency for telephone bills
Cumulative Cum.Relative
Class Frequency frequency frquency

}}
0-15 71 71 71/200=.355
15-30 37 108 108/200=.540
30-45 13 121 121/200=.605
45-60 9 130 130/200=.650
60-75 10 140 140/200=.700
75-90 18 158 158/200=.790
90-105 28 186 186/200=.930
105-200 14 200 200/200=1.000

32
Ogive

Ogive
120.00%

100.00%

80.00%

60.00%
Cumulative %

40.00%

20.00%

.00%
15 30 45 60 75 90 105 120

33
Example 2.5

–A small-business owner wants to assess the affects


of advertising on sales levels.
–Paired observation data were collected.
–Each pair consisted of monthly advertising
expenditure and monthly sales levels.

34
Scatter diagram
• A scatter diagram can describe the relationship
between advertising expenditure and sales.

35
Scatter diagram
Advert Sales Sales Excel scatter diagram
1 30
3 40 60
50
5 40 40

Sales
4 50 30
20
2 35 10
5 50 0

3 35 0 1 2 3 4 5 6

2 25 Advertising Expenditure

36
Typical patterns
Positive linear relationship No relationship Negative linear relationship

Negative nonlinear relationship Nonlinear (concave) relationship

This is a possible relationship,


but doesn’t the next one appear
to fit the data better?

37
2.5 Describing time-series data
• Time-series data are graphically depicted on a line
chart.
• A line chart of a time series plots the value of the
variable on the vertical axis and time periods on the
horizontal axis.
• A line chart is alternatively known as a time-series
chart.

38
Line chart

• Plot the frequency of a category above the point on the


horizontal axis representing that category.
• Use line charts when the categories are points in time.

39
Line charts (cont)

Total number of air conditioners manufactured in a factory


1999–2004

20 000

15 000

10 000
Line charts are particularly useful when
the trend over time is to be emphasised.
5 000

0
1999 2000 2001 2002 2003 2004

40
Standards and benchmarks
• Many consider Charles Joseph Minard’s original time
series chart to be the best statistical graphic ever
drawn. Why?
• He took a two dimensional space and managed to
accurately depict five data variables:
–size of invading army
–size of retreating army
–geographic location
–Temperature; and
–time.
• The multivariate data is presented in such a way as to
provide an intriguing narrative as to the fate of
Napoleon’s army.

41
Graphical excellence
Edward Tufte of Yale describes graphical excellence
as…
• The well-designed presentation of interesting data – a
matter of substance, of statistics, and of design;
• that gives the viewer the greatest number of ideas in
the shortest time with the least ink in the smallest
space;
• which is nearly always multivariate; and
• which requires telling the truth about the data.

42
Graphical excellence (cont)
• Graphical excellence deals with the effective use of
graphical techniques.
• Effective graphical techniques are:
–informative
–concise
–and give a clear presentation of the data to the
viewer.
• How can we achieve graphical excellence?

43
Graphical excellence (cont)

• Graphical excellence is achieved when:


–the graph presents large data sets concisely and
coherently
–the ideas and concepts to be delivered are clearly
understood by the viewer
–the graph encourages the viewer to compare
variables
–the display induces the viewer to address the
substance of the data, not the form of the graph
–and there is no distortion of what the data reveal.

44
Figure 3.1
• Graphical techniques should be used when there is a
large amount of data…

•These bar charts are completely unnecessary because:


– only three numbers are represented.
– there is no analysis associated with the data.
45
Figure 3.2
• Here is pie chart that contains only 3 numbers

It catches your eye but provides no useful information.

46
Figure 3.3
• This is a pie chart that contains only 4 numbers

Here, a table would easily suffice.


47
Figure 3.4

• A bar chart that contains only 6 numbers

–Remove the numbers and its


difficult to understand the bars
on the chart.

–Remove the graph, and the


numbers still speak for
themselves.

48
Graphical deception

• Graphical techniques create a visual impression,


which is easy to distort, therefore…
• It is more important than ever to be able to critically
evaluate the graphically presented information.
• Be wary of graphs without a scale on one axis.
• Understand the information being presented: absolute
values, relative values (e.g. percentages, deltas).
• Are the horizontal or vertical axes distorted in any
way?

49
Is there a missing scale on
Are changes presented in absolute
one axis?
values only, or in percentage form too?
? (3%)
120.0
(2%)
110.0
(1%)
100.0

Time Time
Has any axis been stretched?
10%

Dollars

10%

Aug. 99 Sept. 99
1989 1994 1999
50
Written reports
• Here is one suggested method for structuring a
report that presents statistical information & analysis
to others:
1) Objective statement
2) Description of the experiment
3) Results
– Describe using words, tables, and charts.
4) Discussion of limitations
– Discuss problems with the analysis
– Include violations of required conditions,
assumptions, etc.

51
References

• Selevanathan (2010), Australian Business Statistics 5th


edition, Cengage, Chapters 1-4
• Berenson et al (2009) Basic Business Statistics 2nd
Edition Pearson Chapters 1-3
• Black, Asafu-Adjaye, Khan, Perera, Edwards, Harris
(2010) "Australasian Business Statistics" 2nd Edition
Wiley - Chapters 1-3
• Mansfield, E. (1994), Statistics for Business and
Economics, Fifth Edition, Norton, New York. Chapters 1-3

52

You might also like