Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 42

Summary Statistics

 
Measures of Central Tendency

A measure of central tendency is a value used to represent the


typical or “average” value in a data set
There are 4 values that are considered measures of the center.
1. Mean
2. Median
3. Mode
4. Midrange
 
Measures of central tendency for raw data

• Suppose you are weighing babies born at your clinic somewhere in


Malawi, and the baby weights (in kg) of the first 10 babies were as
follows:
2.7, 3.4, 3.0, 4.1, 5.2, 1.9, 2.3, 3.0., 3.3, 3.0
What single figure could represent the baby weights at this clinic?

Lets see how different measures of central tendency are computed.


The mode

• The mode is the data value or datum (or value) which appears the
largest number of times in the set or the most frequently occurring
figure in the set
• If no data value is repeated, we say there is no mode.
Using the following data set;
2.7kg, 3.4kg, 3.0kg, 4.1kg, 5.2kg, 1.9kg, 2.3kg, 3.0kg, 3.3kg, 3.0kg.
The mode is 3.0kg (highest frequency)
The Median

• The median is defined as the middle figure after the data set is ranked
or placed in order of magnitude.

• The median M of a set of N observations which have been ranked in


order of size is equal to the value taken by the middle (the ½[N+1]th)
observation when N is odd, and is half the sum of the values of the
two middle observations (the ½Nth and [½N+1]th) when N is even.
The Median

Example
22, 29, 35, 24, 26, 15, 28, 36, 45, 21, 33, 5, 46, 21, 19, 41, 5, 84, 58, 63,
5, 23
Find the median.
Solution
Rank the data in ascending order
5, 5, 5, 15, 19, 21, 21, 22, 23, 24, 26, 28, 29, 33, 35, 36, 41, 45, 46, 58,
63, 84
The Median

• Then pick the two middle numbers (because the total number of
observations is even, i.e. = 22)
5, 5, 5, 15, 19, 21, 21, 22, 23, 24, 26, 28, 29, 33, 35, 36, 41, 45, 46, 58,
63, 84
• The two middle figures are 26 and 28. The average of these two
figures is the median i.e. (26+28)/2 = 27 is the median.
 
The Arithmetic Mean
• This is another measure of the centre of observations
• The (arithmetic) mean of set of observations is the sum of the
observations divided by the number of the observations
• The mean of a sample data set is denoted by x
• The mean of a population data set is denoted by 
The Arithmetic Mean
Example
• The following data are journey time of college students from their
place of residence to College:
17, 30, 14, 16, 26, 15, 27, 18, 26 minutes
• The mean of the journey times is
17+30+14+16+26+15+27+18+26 = 189/9 = 21 minutes
Median and mean of grouped discrete data

• Consider the letters example:


A man kept count of the number of letters he received each day over a
period of 100 days (excluding Sundays). The observations were:
0 2 1 1 1 2 0 0 1 0 1 1 0 0 0 3 1 2 0 1
1 0 0 1 0 1 1 0 2 0 0 0 1 0 1 0 2 1 2 0
0 2 0 1 0 1 0 1 0 3 1 2 0 0 0 0 1 0 0 0
1 0 1 0 1 0 2 0 1 2 1 2 0 1 0 2 2 1 0 1
0 0 0 0 5 0 1 1 2 0 0 2 1 0 2 0 0 2 1 0
Example 1: Frequency table and bar diagram of a discrete variable

• Tally count and frequency table


Calculate: (a) the median; (b) the mean; of the letter data

No. of letters 0 1 2 3 4 5
per day
Frequency 48 32 17 2 0 1
Cummulative 48 80 97 99 99 100
frequency
Calculate: (a) the median; (b) the mean; of the letter data

• (a) Median. There are 100 observations. The median is half the sum of
the 50th and 51st observations in the ranked order. We see from the
cumulative frequencies that both these observations equal 1. Hence
the median number of letters per day is 1.
• (b) Mean. Of the 100 observations 48 are 0’s, 32 are 1’s, etc. Hence
the mean equals (40x0+32x1+17x2+2x3+0x4+1x5)/100 = 0.77.
Exercises
• Work on the following exercises from Clarke & Cooke
• 2.2.2
• 2.3.2
Median and mean of grouped continuous data

Example
• Calculate (a) the median; (b) the mean, of the following data on the
height in centimetres of 10 plants in pots. The data have been
grouped

Class-interval 11.5-16.5 16.5-21.5 21.5-26.5 26.5-31.5


Class centre 14.0 19.0 24.0 29.0
Frequency 2 5 2 1
Cumulative frequency 2 7 9 10
Median and mean of grouped continuous data

(a) Median
• The median is obviously inside the interval (16.5-21.5). If we assume
that the five observations which lie in the interval are equally spread
out with it, and put each at the centre of its own small interval, we
obtain the diagram
Median and mean of grouped continuous data

(a) Median
• From the definition, the median is half the sum of the and
observations. This value is at the end of the third of the five equal
intervals into which the interval (16.5, 21.5) is divided, and is 19.5
Median and mean of grouped continuous data

(a) Mean
• When calculating the mean from grouped continuous data we act as if
all the observations in a given interval are equal in value to the class-
centre of that interval.
• We then proceed as with grouped discrete data. The mean is
therefore
Median and mean of grouped continuous data

Exercise
• Try exercise 2.3.2???
-notation (Sigma notation)
• We can express the definition of the arithmetic mean in a simple
formula using the -notation.
• For example, in our student journey times example, we can represent
each journey time by as below

17 30 14 16 26 15 27 18 26

• The whole set of observations is {: i=1, 2, …, 9}


-notation (Sigma notation)

• The sum of a set of N observations, i.e. , may be written , by


introducing the sign which stands for ‘the sum of’

• The expression is read “sigma , equal one to N and is defined as follows

• Definition: is the sum of the quantities , as takes successively the


values 1, 2, …, N.

• The summation is said to be over , which is the index of summation,


and the range of summation is from 1 to N

• The arithmetic mean of the set of observations (=1, 2, …, N) may be


represented by (x-bar).
-notation

• The (arithmetic) mean of the observations is


-notation: exercise

1. Show that
(i) , (ii) , (iii)
(iv) , (v) , (vi)
2. If (=1, 2, 3) and (=1, 2, 3) take the values shown in the
following table

6 1 2 5 3 4
confirm the following relations
-notation: exercise

(i) , (ii) , (iii)


(iv) ,
(v)
Rules of operation with

The following rules apply with


1. ,

2. , where c is a constant;

3. , where c is a constant;

4. , provided
The Arithmetic Mean
n

• Mean is given by x
i 1
i
x
n
Where n is number of observation in the sample

Example
Use the following data set to compute a sample mean

1.65kg, 3.3kg, 4.1kg, 3.0kg, 3.1kg 2.9kg 2.8kg, 3.2 kg, 3.0kg, 3.0kg
The Arithmetic Mean

1.65  3.3  4.1  3  3.1  2.9  2.8  3.2  3  3


 x   3.005kg  
10
Measures of Dispersion
Dispersion
• The measure of the spread or variability

• No Variability – No Dispersion
Measures of Variation
• There are 3 values that we will look at to measure
the amount of dispersion or variation. (The
spread of the group)

1. Range
2. Standard Deviation
3. Quartile deviation
Why is it Important?
• You want to choose the best brand of
medicine for your patients. You are
interested in how long the drugs takes to
cure a disease. The choices are narrowed
down to 2 different drugs. The results are
shown in the chart. Which drug would
you choose?
Drug A Drug B
The chart 10 35
indicates the 60 45
number of days a 50 30
drug takes to cure 30 35

a particular 40 40

disease. 20 25
210 210
Does the Average Help?
• Drug A: Avg = 210/6 = 35 days

• Drug B: Avg = 210/6 = 35 days

• They both last 35 days to cure a disease. No


help in deciding which to buy.
Consider the Spread
• Drug A: Spread = 60 – 10 = 50 days

• Drug B: Spread = 45 – 25 = 20 days

• Drug B has a smaller variability which means that it


performs more consistently. Choose drug B.
Range
• The range is the difference between the lowest
value in the set and the highest value in the set.

• Range = High # - Low #


Example
• Find the range of the data set.

• 40, 30, 15, 2, 100, 37, 24, 99

• Range = 100 – 2 = 98
Deviation from the Mean
• A deviation from the mean, x – x , is the difference
between the value of x and the mean x

We base our formulas for variance and standard


deviation on the amount that they deviate from the
mean.
Formulae for sample and population
variances
Computation formulae Definition formulae
2 n
( x)
 (x
2 2
x   x)
2
s  n i

n 1 S2  i 1

n 1
( xi ) 2 N
   (x
2
x
N i  ) 2
 2

N 2  i 1

N
Standard Deviation
• The standard deviation is the square root of the
variance.

2
s  s
Example – Using Formula
• Find the variance of the following
dataset 6, 3, 8, 5, 3 (in hours)
x x 2

6 36
3 9
8 64
5 25
3 9
2
 x  25  x  143
2 ( x) 2
x 
s2  n
n 1

25 2
143 
2 5 143  125 18
s     4. 5
4 4 4
Find the standard deviation
• The standard deviation is the square root of the
variance.

s  4.5  2.12

You might also like