Professional Documents
Culture Documents
Lesson 3
Lesson 3
OF
AGRICULTURE & TECHNOLOGY
JKUAT SODeL
Nairobi, Kenya
E-mail: elearning@jkuat.ac.ke
Back Close 0
STA 2100 Probability and Statistics I
LESSON 3
Presentation of Data
Learning outcomes
JKUAT SODeL
sentation.
Compare the presentations of the same set of data by using
various graphs.
Understand the criterion for the selection of a method to
JJ II organize and present data
J I Identify the different methods of data organization and
J DocDoc I
JKUAT: Setting trends in higher Education, Research and Innovation
Back Close 1
STA 2100 Probability and Statistics I
presentation
Identify sources of deception in misleading graphs.
JKUAT SODeL
©2014
JJ II
J I
J DocDoc I
JKUAT: Setting trends in higher Education, Research and Innovation
Back Close 2
STA 2100 Probability and Statistics I
3.1. Introduction
The raw data collected through the various methods of data
collection will be in a haphazard and unsystematic form and is
not appropriately formed to draw conclusions about the group
JKUAT SODeL
Back Close 3
STA 2100 Probability and Statistics I
Reading lots of numbers in the text puts people to sleep and
does little to convey information. Tables are the most commonly
used form of data graphics, but graphs, charts or diagrams that
include symbols and pictures will get your results across to the
JKUAT SODeL
3.2. Tables
©2014
Once we have collected our data, often the first stage of any
analysis is to present them in a simple and easily understood
way. Tables are perhaps the simplest means of presenting data.
There are many types of tables. For example, we have all
seen tables listing sales of computers by type, or exchange rates,
JJ II or the financial performance of companies. These types of tables
J I can be very informative. However, they can also be difficult to
J DocDoc I
JKUAT: Setting trends in higher Education, Research and Innovation
Back Close 4
STA 2100 Probability and Statistics I
interpret, especially those which contain vast amounts of data.
Frequency tables are amongst the most commonly–used ta-
bles and are perhaps the most easily understood. They can
be used with continuous, discrete, categorical and ordinal data.
JKUAT SODeL
Back Close 5
STA 2100 Probability and Statistics I
If the information is presented in tabular form or in a de-
scriptive record, but it becomes difficult to draw results.
Graphical form makes it possible to easily draw visual im-
pressions of data.
JKUAT SODeL
Back Close 6
STA 2100 Probability and Statistics I
age–sex composition, occupational structure, etc.
When creating graphic displays, keep in mind the following ques-
tions:
What am I trying to communicate?
JKUAT SODeL
Who is my audience?
What might prevent them from understanding this dis-
©2014
play?
Does the display tell the entire story?
Some Rules of Thumb
show the data
avoid distorting the data
JJ II induce the viewer to think about the substance of the
J I graphic rather than the methodology, graphic design, or
J DocDoc I
JKUAT: Setting trends in higher Education, Research and Innovation
Back Close 7
STA 2100 Probability and Statistics I
something else
make large amounts of data coherent
encourage the viewer to use the graphic as you intend, e.g.
make comparisons
JKUAT SODeL
be as simple as possible
In the following section we discuss some of the commonly used
graphical presentations
Back Close 8
STA 2100 Probability and Statistics I
municate simple ideas, for example market share of ISP in a
country. They are used to show the proportions of a whole.
They are best used when there are only a handful of categories
to display.
JKUAT SODeL
Back Close 9
STA 2100 Probability and Statistics I
Accidental Deaths in the USA in 2002.
Type Frequency
Motor Vehicle 43,500
Falls 12,200
JKUAT SODeL
Poison 6,400
Drowning 4,600
Fire 4,200
©2014
JJ II
J I
J DocDoc I
JKUAT: Setting trends in higher Education, Research and Innovation
Back Close 10
STA 2100 Probability and Statistics I
Pie Chart
To create a pie chart for the data, find the relative frequency
(percent) of each category.
Relative
JKUAT SODeL
Type Frequency
Frequency
Motor Vehicle 43,500 0.578
Falls 12,200 0.162
Poison 6,400 0.085
©2014
JJ II
J I
J DocDoc I
JKUAT: Setting trends in higher Education, Research and Innovation
Back Close 11
STA 2100 Probability and Statistics I
Pie Chart
Next, find the central angle. To find the central angle,
multiply the relative frequency by 360°.
Relative
JKUAT SODeL
JJ II
J I
J DocDoc I
JKUAT: Setting trends in higher Education, Research and Innovation
Back Close 12
STA 2100 Probability and Statistics I
Pie Chart
Ingestion Firearms
3.9% 1.9%
Fire
5.6%
JKUAT SODeL
Drowning
6.1%
Poison
8.5% Motor
©2014
vehicles
Falls 57.8%
16.2%
Athiany, HKO 31
JJ II
J I
J DocDoc I
JKUAT: Setting trends in higher Education, Research and Innovation
Back Close 13
STA 2100 Probability and Statistics I
3.3.2. Bar Charts
Bar charts are a commonly used and clear way of presenting cat-
egorical data or any ungrouped discrete frequency observations.
The five step process of creating a bar chart are shown below:
JKUAT SODeL
Back Close 14
STA 2100 Probability and Statistics I
3. Having decided on a range for the frequency axis we need
to decide on a suitable number scale to label this axis.
This should have sensible values, for example, 0, 1, 2, . .
. , or 0, 10, 20 . . . , or other such values as make sense
JKUAT SODeL
Back Close 15
STA 2100 Probability and Statistics I
Example . Use the following data representing the number of
guests who were booked in a hotel in Mombasa on a particular
day in the month of December 2013, construct a suitable bar
graph for the data.
JKUAT SODeL
Mainland European 2 2
Rest of the world 5 7
Total 21 26
The corresponding bar graph is as shown below.
JJ II
J I
J DocDoc I
JKUAT: Setting trends in higher Education, Research and Innovation
Back Close 16
STA 2100 Probability and Statistics I
JKUAT SODeL
©2014
Back Close 17
STA 2100 Probability and Statistics I
3.3.3. Histograms
Bar charts have their limitations; for example, they cannot be
used to present continuous data. When dealing with continuous
random variables a different kind of graph is required. This is
JKUAT SODeL
Back Close 18
STA 2100 Probability and Statistics I
frequency.
Back Close 19
STA 2100 Probability and Statistics I
2. The range of the horizontal (x–axis) needs to include not
only the full range of observations but also the full range
of the class intervals from the frequency table.
3. Draw a bar for each group in your frequency table. These
JKUAT SODeL
Example . The following data represents the ages of 30 IT
students in a statistics class. Construct a histogram using this
data.
JJ II
J I
J DocDoc I
JKUAT: Setting trends in higher Education, Research and Innovation
Back Close 20
STA 2100 Probability and Statistics I
18 20 21 27 29 20
19 30 32 19 34 19
24 29 18 38 37 22
30 39 32 44 33 46
JKUAT SODeL
54 49 18 51 21 21
Before constructing the histogram, we need to first construct a
frequency distribution, then finally construct the histogram. In
©2014
Back Close 21
STA 2100 Probability and Statistics I
wise, it may be difficult to detect any patterns.
3. Find the class width as follows. Determine the range of
the data, divide the range by the number of classes, and
round up to the next convenient number.
JKUAT SODeL
4. Find the class limits. You can use the minimum entry as
the lower limit of the first class. To find the remaining
lower limits, add the class width to the lower limit of the
©2014
Back Close 22
STA 2100 Probability and Statistics I
Obtain the number of classes (k) as follows: k = roundup(log(n)
The minimum data entry is 18 and maximum entry is 54,
so the range is 36. Divide the range by the number of
classes to find the class width.
JKUAT SODeL
36
– Classwidth = 5
= 7.2 round up to 8
The minimum data entry of 18 may be used for the lower
limit of the first class. To find the lower class limits of the
©2014
Back Close 23
STA 2100 Probability and Statistics I
In summary, the frequency distribution is as follows;
JKUAT SODeL
©2014
JJ II
J I
J DocDoc I
JKUAT: Setting trends in higher Education, Research and Innovation
Back Close 24
STA 2100 Probability and Statistics I
42 – 49 3
Check that the
50 – 57 2 sum equals
the number in
f 30
the sample.
Athiany, HKO 10
Back Close 25
STA 2100 Probability and Statistics I
Frequency Histogram
A frequency histogram is a bar graph that represents
the frequency distribution of a data set.
1. The horizontal scale is quantitative and measures
JKUAT SODeL
JJ II
J I
J DocDoc I
JKUAT: Setting trends in higher Education, Research and Innovation
Back Close 26
STA 2100 Probability and Statistics I
Class Boundaries
Lets consider the class boundaries for the “Ages of the IT Students”
frequency distribution.
Ages of Students
JKUAT SODeL
Class
Class Frequency, f Boundaries
The distance from 18 – 25 13 17.5 25.5
the upper limit of
the first class to the 26 – 33 8 25.5 33.5
©2014
Athiany, HKO 17
JJ II
J I
J DocDoc I
JKUAT: Setting trends in higher Education, Research and Innovation
Back Close 27
STA 2100 Probability and Statistics I
An finally the histogram;
Frequency Histogram
To draw a frequency histogram for the “Ages of Students”
frequency distribution, we use the class boundaries.
JKUAT SODeL
14 13 Ages of Students
12
©2014
10
8
8
f 6
4
4 3
2 2
0
17.5 25.5 33.5 41.5 49.5 57.5
Broken axis
Age (in years)
JJ II
Athiany, HKO 18
J I
You may have noticed that we referred to the above his-
J DocDoc I
JKUAT: Setting trends in higher Education, Research and Innovation
Back Close 28
STA 2100 Probability and Statistics I
togram as a frequency histogram. Instead of constructing a fre-
quency histogram, we may also be interested in constructing a
relative frequency histogram. The process is quite similar to
JKUAT SODeL
JJ II
J I
J DocDoc I
JKUAT: Setting trends in higher Education, Research and Innovation
Back Close 29
STA 2100 Probability and Statistics I
Relative Frequency
First, we need to find the relative frequencies for the “Ages of IT
Students” frequency distribution as follows.
JKUAT SODeL
Relative Portion of
Class Frequency, f Frequency students
18 – 25 13 0.433 f 13
26 – 33 8 0.267 n 30
©2014
34 – 41 4 0.133 0.433
42 – 49 3 0.1
50 – 57 2 0.067
f
f 30 1
n
Athiany, HKO 14
JJ II
J I
J DocDoc I
JKUAT: Setting trends in higher Education, Research and Innovation
Back Close 30
STA 2100 Probability and Statistics I
0.5
0.433
(portion of students)
Relative frequency
0.2
0.133
0.1
0.1 0.067
0
17.5 25.5 33.5 41.5 49.5 57.5
Age (in years)
Athiany, HKO 20
JJ II
J I
J DocDoc I
JKUAT: Setting trends in higher Education, Research and Innovation
Back Close 31
STA 2100 Probability and Statistics I
3.3.4. Frequency Polygons
These are a natural extension of the relative frequency his-
togram. They differ in that, rather than drawing bars, each
class is represented by one point and these are joined together
JKUAT SODeL
Back Close 32
STA 2100 Probability and Statistics I
each class.
4. Join adjacent points together with straight lines.
A frequency polygon is a line graph that emphasizes the contin-
uous change in frequencies.
JKUAT SODeL
JJ II
J I
J DocDoc I
JKUAT: Setting trends in higher Education, Research and Innovation
Back Close 33
STA 2100 Probability and Statistics I
Frequency Polygon
14
Ages of Students
JKUAT SODeL
12
10
8 Line is extended
to the x-axis.
f 6
©2014
4
2
0
13.5 21.5 29.5 37.5 45.5 53.5 61.5
Broken axis
Age (in years) Midpoints
Athiany, HKO 19
JJ II
J I
J DocDoc I
JKUAT: Setting trends in higher Education, Research and Innovation
Back Close 34
STA 2100 Probability and Statistics I
3.3.5. Cumulative Frequency Polygons (Ogive)
Cumulative percentage relative frequency is also a useful tool.
The cumulative percentage relative frequency is simply the sum
of the percentage relative frequencies at the end of each class
JKUAT SODeL
Back Close 35
STA 2100 Probability and Statistics I
4. Join adjacent points, starting at 0% at the lowest class
boundary.
Example . Use the IT students age data to construct an
Ogive curve/plot.
JKUAT SODeL
©2014
JJ II
J I
J DocDoc I
JKUAT: Setting trends in higher Education, Research and Innovation
Back Close 36
STA 2100 Probability and Statistics I
30 Ages of Students
Cumulative frequency
(portion of students)
24
18
The graph ends
at the upper
©2014
12 boundary of the
last class.
6
0
17.5 25.5 33.5 41.5 49.5 57.5
Age (in years)
Athiany, HKO 21
JJ II
J I
J DocDoc I
JKUAT: Setting trends in higher Education, Research and Innovation
Back Close 37
STA 2100 Probability and Statistics I
3.3.6. Scatter Plots
Scatter plots are used to plot two variables which you believe
might be related, for example, height and weight, advertising
expenditure and sales, or age of machinery and maintenance
JKUAT SODeL
costs.
Back Close 38
STA 2100 Probability and Statistics I
3.3.8. Stem and leaf plots
Stem and leaf plots are a quick and easy way of representing data
graphically. They can be used with both discrete and continuous
data. The method for creating a stem and leaf plot is similar
JKUAT SODeL
widths for a stem and leaf plot must be equal. Because of the
way the plot works it is best to use “sensible” values for the
interval width – i.e. 5, 10, 100, 1000; if a data set consists of
many small values, this interval width could also be 1, or even 0.1
or 0.01. Once we have decided on our intervals we can construct
JJ II the stem and leaf plot.
J I Consider the following data: 11, 12, 9, 15, 21, 25, 19, 8. The
J DocDoc I
JKUAT: Setting trends in higher Education, Research and Innovation
Back Close 39
STA 2100 Probability and Statistics I
first step is to decide on interval widths – one obvious choice
would be to go up in 10s. This would give a stem unit of 10 and
a leaf unit of 1. The stem and leaf plot is constructed as below.
Stem units: 10, leaf digits: 1 (the value 8.000 is represented
JKUAT SODeL
by 0|8)
0|89
1|1259
©2014
2|15
In a stem-and-leaf plot, each number is separated into a stem
(usually the entry’s leftmost digits) and a leaf (usually the right-
most digit). This is an example of exploratory data analysis.
JJ II Using the IT students age data set, we can construct a stem
J I
J DocDoc I
JKUAT: Setting trends in higher Education, Research and Innovation
Back Close 40
STA 2100 Probability and Statistics I
and leaf plot as follows
Stem-and-Leaf Plot
JKUAT SODeL
Ages of Students
Key: 1|8 = 18
1 888999
2 0011124799 Most of the values lie
©2014
4 469
5 14
This graph allows us to see
the shape of the data as well
as the actual values.
Athiany, HKO 24
JJ II
J I
J DocDoc I
JKUAT: Setting trends in higher Education, Research and Innovation
Back Close 41
STA 2100 Probability and Statistics I
Stem-and-Leaf Plot
Constructing a stem-and-leaf plot that has two lines for
each stem.
Ages of Students
JKUAT SODeL
1 Key: 1|8 = 18
1 888999
2 0011124
2 799
©2014
3 002234
3 789 From this graph, we can
4 4 conclude that more than 50%
4 69 of the data lie between 20
5 14 and 34.
5
Athiany, HKO 25
JJ II
J I
J DocDoc I
JKUAT: Setting trends in higher Education, Research and Innovation
Back Close 42
STA 2100 Probability and Statistics I
Revision Questions
frequency polygon.
Learning Activities
1. Read more on scatter plots and Box plots and summarize
their use, advantages and disadvantages versus the meth-
JJ II ods we have presented in this lecture. If possible, give
J I some examples of a scatter plot and a box plot.
J DocDoc I
JKUAT: Setting trends in higher Education, Research and Innovation
Back Close 43
STA 2100 Probability and Statistics I
2. Search for at least five bad graphs and discuss why they
are bad.
JKUAT SODeL
©2014
JJ II
J I
J DocDoc I
JKUAT: Setting trends in higher Education, Research and Innovation
Back Close 44
STA 2100 Probability and Statistics I
Solutions to Exercises
Exercise 1. A pie chart is mostly useful in displaying a rel-
ative frequency (percentage) distribution; similar to Bar chart
while a histogram is useful for revealing the general pattern or
JKUAT SODeL
JJ II
J I
J DocDoc I
JKUAT: Setting trends in higher Education, Research and Innovation
Back Close 45