Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 21

The Range (Statistics)

The Range is the difference between the lowest and highest values.

Example: In {4, 6, 9, 3, 7} the lowest value is 3, and the highest is 9.

So the range is 9 − 3 = 6.

It is that simple!

But perhaps too simple ...

The Range Can Be Misleading


The range can sometimes be misleading when there are extremely high or low
values.

Example: In {8, 11, 5, 9, 7, 6, 3616}:

 the lowest value is 5,


 and the highest is 3616,

So the range is 3616 − 5 = 3611.

The single value of 3616 makes the range large, but most values are around 10.

So we may be better off using Interquartile Range or Standard Deviation .


Quartiles
Quartiles are the values that divide a list of numbers into quarters:

 Put the list of numbers in order


 Then cut the list into four equal parts
 The Quartiles are at the "cuts"

Like this:

Example: 5, 7, 4, 4, 6, 2, 8

Put them in order: 2, 4, 4, 5, 6, 7, 8

Cut the list into quarters:

And the result is:

 Quartile 1 (Q1) = 4
 Quartile 2 (Q2), which is also the Median, = 5
 Quartile 3 (Q3) = 7

Sometimes a "cut" is between two numbers ... the Quartile is the average of the
two numbers.

Example: 1, 3, 3, 4, 5, 6, 6, 7, 8, 8

The numbers are already in order

Cut the list into quarters:

In this case Quartile 2 is half way between 5 and 6:

Q2 = (5+6)/2 = 5.5
And the result is:

 Quartile 1 (Q1) = 3
 Quartile 2 (Q2) = 5.5
 Quartile 3 (Q3) = 7

Interquartile Range
The "Interquartile Range" is from Q1 to Q3:

To calculate it just subtract Quartile 1 from Quartile 3, like this:

Example:

The Interquartile Range is:

Q3 − Q1 = 7 − 4 = 3

Box and Whisker Plot


We can show all the important values in a "Box and Whisker Plot", like this:

A final example covering everything:

Example: Box and Whisker Plot and Interquartile Range for

4, 17, 7, 14, 18, 12, 3, 16, 10, 4, 4, 11

Put them in order:


3, 4, 4, 4, 7, 10, 11, 12, 14, 16, 17, 18

Cut it into quarters:

3, 4, 4 | 4, 7, 10 | 11, 12, 14 | 16, 17, 18

In this case all the quartiles are between numbers:

 Quartile 1 (Q1) = (4+4)/2 = 4


 Quartile 2 (Q2) = (10+11)/2 = 10.5
 Quartile 3 (Q3) = (14+16)/2 = 15

Also:

 The Lowest Value is 3,


 The Highest Value is 18

So now we have enough data for the Box and Whisker Plot:

And the Interquartile Range is:

Q3 − Q1 = 15 − 4 = 11

Percentiles
Percentile: the value below which a percentage of data falls.

Example: You are the fourth tallest person in a group of 20

80% of people are shorter than you:

That means you are at the 80th percentile.

If your height is 1.85m then "1.85m" is the 80th percentile height in that group.
In Order
Have the data in order, so you know which values are above and below.

 To calculate percentiles of height: have the data in height order (sorted by


height).
 To calculate percentiles of age: have the data in age order.
 And so on.

Grouped Data
When the data is grouped:

Add up all percentages below the score,


plus half the percentage at the score.

Example: You Score a B!

In the test 12% got D, 50% got C, 30% got B and 8% got A

You got a B, so add up

 all the 12% that got D,


 all the 50% that got C,
 half of the 30% that got B,

for a total percentile of 12% + 50% + 15% = 77%

In other words you did "as well or better than 77% of the class"

(Why take half of B? Because you shouldn't imagine you got the "Best B", or the
"Worst B", just an average B.)
Deciles
Deciles are similar to Percentiles (sounds like decimal and percentile together),
as they split the data into 10% groups:

 The 1st decile is the 10th percentile (the value that divides the data so
that 10% is below it)
 The 2nd decile is the 20th percentile (the value that divides the data so
that 20% is below it)
 etc!
Example: (continued)

You are at the 8th decile (the 80th percentile).

Quartiles
Another related idea is Quartiles , which splits the data into quarters:

Example: 1, 3, 3, 4, 5, 6, 6, 7, 8, 8

The numbers are in order. Cut the list into quarters:

In this case Quartile 2 is half way between 5 and 6:

Q2 = (5+6)/2 = 5.5

And the result is:

 Quartile 1 (Q1) = 3
 Quartile 2 (Q2) = 5.5
 Quartile 3 (Q3) = 7
The Quartiles also divide the data into divisions of 25%, so:

 Quartile 1 (Q1) can be called the 25th percentile


 Quartile 2 (Q2) can be called the 50th percentile
 Quartile 3 (Q3) can be called the 75th percentile
Example: (continued)

For 1, 3, 3, 4, 5, 6, 6, 7, 8, 8:

 The 25th percentile = 3


 The 50th percentile = 5.5
 The 75th percentile = 7

Estimating Percentiles
We can estimate percentiles from a line graph .

Example: Shopping

A total of 10,000 people visited the shopping mall over 12 hours:

Time (hours) People

0 0

2 350

4 1100

6 2400
8 6500

10 8850

12 10,000

a) Estimate the 30th percentile (when 30% of the visitors had


arrived).

b) Estimate what percentile of visitors had arrived after 11 hours.

First draw a line graph of the data: plot the points and join them with a smooth
curve:

a) The 30th percentile occurs when the visits reach 3,000.

Draw a line horizontally across from 3,000 until you hit the curve, then draw a
line vertically downwards to read off the time on the horizontal axis:

So the 30th percentile occurs after about 6.5 hours.

b) To estimate the percentile of visits after 11 hours: draw a line vertically up


from 11 until you hit the curve, then draw a line horizontally across to read off
the population on the vertical axis:

So the visits at 11 hours were about 9,500, which is the 95th percentile.

Mean Deviation
How far, on average, all values are from the middle.
Calculating It
Find the mean of all values ... use it to work out distances ... then find the
mean of those distances!

In three steps:

 1. Find the mean of all values


 2. Find the distance of each value from that mean (subtract the mean from each
value, ignore minus signs)
 3. Then find the mean of those distances

Like this:

Example: the Mean Deviation of 3, 6, 6, 7, 8, 11, 15, 16

Step 1: Find the mean:

Mean = 3 + 6 + 6 + 7 + 8 + 11 + 15 + 168 = 728 = 9

Step 2: Find the distance of each value from that mean:

Value Distance from 9

3 6

6 3

6 3

7 2

8 1

11 2

15 6

16 7
Which looks like this:

(No minus signs!)

Step 3. Find the mean of those distances:

Mean Deviation = 6 + 3 + 3 + 2 + 1 + 2 + 6 + 78 = 308 = 3.75

So, the mean = 9, and the mean deviation = 3.75

It tells us how far, on average, all values are from the middle.

In that example the values are, on average, 3.75 away from the middle.

For deviation just think distance

Formula
The formula is:

Mean Deviation = Σ|x − μ|N

 Σ is Sigma, which means to sum up


 || (the vertical bars) mean Absolute Value, basically to ignore minus signs
 x is each value (such as 3 or 16)
 μ is the mean (in our example μ = 9)
 N is the number of values (in our example N = 8)

Let's look at those in more detail:

Absolute Deviation
Each distance we calculate is called an Absolute Deviation, because it is
the Absolute Value of the deviation (how far from the mean).

To show "Absolute Value" we put "|" marks either side like this:

|-3| = 3

For any value x:

Absolute Deviation = |x - μ|

From our example, the value 16 has Absolute Deviation = |x - μ| = |16 - 9|


= |7| = 7

And now let's add them all up ...

Sigma
The symbol for "Sum Up" is Σ (called Sigma Notation ), so we have:

Sum of Absolute Deviations = Σ|x - μ|

Divide by how many values N and we have:

Mean Deviation = Σ|x − μ|N

Let's do our example again, using the proper symbols:

Example: the Mean Deviation of 3, 6, 6, 7, 8, 11, 15, 16

Step 1: Find the mean:

μ = 3 + 6 + 6 + 7 + 8 + 11 + 15 + 168 = 728 = 9

Step 2: Find the Absolute Deviations:

x |x - μ|
3 6

6 3

6 3

7 2

8 1

11 2

15 6

16 7

Σ|x - μ| = 30

Step 3. Find the Mean Deviation:

Mean Deviation = Σ|x - μ|N = 308 = 3.75

Note: the mean deviation is sometimes called the Mean Absolute Deviation
(MAD) because it is the mean of the absolute deviations.

What Does It "Mean" ?


Mean Deviation tells us how far, on average, all values are from the middle.

Here is an example (using the same data as on the Standard Deviation page):
Example: You and your friends have just measured the heights of
your dogs (in millimeters):

The heights (at the shoulders) are: 600mm, 470mm, 170mm, 430mm and
300mm.

Step 1: Find the mean:

μ = 600 + 470 + 170 + 430 + 3005 = 19705 = 394

Step 2: Find the Absolute Deviations:

x |x - μ|

600 206

470 76

170 224

430 36

300 94

Σ|x - μ| = 636

Step 3. Find the Mean Deviation:


Mean Deviation = Σ|x - μ|N = 6365 = 127.2

So, on average, the dogs' heights are 127.2 mm from the mean.

(Compare that with the Standard Deviation of 147 mm)

A Useful Check
The deviations on one side of the mean should equal the deviations on
the other side.

From our first example:

Example: 3, 6, 6, 7, 8, 11, 15, 16

The deviations are:

6+3+3+2+1 = 2+6+7

15 = 15

Likewise:

Example: Dogs

Deviations left of mean: 224 + 94 = 318

Deviations right of mean: 206 + 76 + 36 = 318

If they are not equal ... you may have made a msitake!
Standard Deviation and
Variance
Deviation just means how far from the normal

Standard Deviation
The Standard Deviation is a measure of how spread out numbers are.

Its symbol is σ (the greek letter sigma)

The formula is easy: it is the square root of the Variance. So now you ask,
"What is the Variance?"

Variance
The Variance is defined as:

The average of the squared differences from the Mean.

To calculate the variance follow these steps:

 Work out the Mean (the simple average of the numbers)


 Then for each number: subtract the Mean and square the result (the squared
difference).
 Then work out the average of those squared differences. (Why Square?)

Example
You and your friends have just measured the heights of your dogs (in
millimeters):

The heights (at the shoulders) are: 600mm, 470mm, 170mm, 430mm and
300mm.

Find out the Mean, the Variance, and the Standard Deviation.

Your first step is to find the Mean:

Answer:
Mean = 600 + 470 + 170 + 430 + 3005

= 19705

= 394

so the mean (average) height is 394 mm. Let's plot this on the chart:

Now we calculate each dog's difference from the Mean:


To calculate the Variance, take each difference, square it, and then average the
result:

Variance

σ2 = 2062 + 762 + (−224)2 + 362 + (−94)25

= 42436 + 5776 + 50176 + 1296 + 88365

= 1085205

= 21704

So the Variance is 21,704

And the Standard Deviation is just the square root of Variance, so:

Standard Deviation

σ = √21704

= 147.32...

= 147 (to the nearest mm)


And the good thing about the Standard Deviation is that it is useful. Now we can
show which heights are within one Standard Deviation (147mm) of the Mean:

So, using the Standard Deviation we have a "standard" way of knowing what is
normal, and what is extra large or extra small.

Rottweilers are tall dogs. And Dachshunds are a bit short ... but don't tell
them!

Using

We can expect about 68% of values to be within plus-or-minus 1 standard


deviation.

Read Standard Normal Distribution to learn more.

Also try the Standard Deviation Calculator .

But ... there is a small change


with Sample Data
Our example has been for a Population (the 5 dogs are the only dogs we are
interested in).
But if the data is a Sample (a selection taken from a bigger Population), then
the calculation changes!

When you have "N" data values that are:

 The Population: divide by N when calculating Variance (like we did)


 A Sample: divide by N-1 when calculating Variance

All other calculations stay the same, including how we calculated the mean.

Example: if our 5 dogs are just a sample of a bigger population of dogs, we


divide by 4 instead of 5 like this:

Sample Variance = 108,520 / 4 = 27,130

Sample Standard Deviation = √27,130 = 164 (to the nearest mm)

Think of it as a "correction" when your data is only a sample.

Formulas
Here are the two formulas, explained at Standard Deviation Formulas if you
want to know more:

The "Population Standard Deviation":

The "Sample Standard Deviation":

Looks complicated, but the important change is to


divide by N-1 (instead of N) when calculating a Sample Variance.
*Footnote: Why square the differences?
If we just add up the differences from the mean ... the negatives cancel the
positives:

4 + 4 − 4 − 44 = 0

So that won't work. How about we use absolute values ?

|4| + |4| + |−4| + |−4|4 = 4 + 4 + 4 + 44 = 4

That looks good (and is the Mean Deviation ), but what about this case:

|7| + |1| + |−6| + |−2|4 = 7 + 1 + 6 + 24 = 4

Oh No! It also gives a value of 4, Even though the differences are more spread
out.
So let us try squaring each difference (and taking the square root at the end):

√(42 + 42 + 42 + 424) = √(644) = 4

√(72 + 12 + 62 + 224) = √(904) = 4.74...


That is nice! The Standard Deviation is bigger when the differences are more
spread out ... just what we want.

In fact this method is a similar idea to distance between points , just applied in
a different way.
And it is easier to use algebra on squares and square roots than absolute
values, which makes the standard deviation easy to use in other areas of
mathematics.

Return to Top

You might also like