Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 30

Further Mathematics: Univariate Data Ringwood SC 2019

Further Mathematics 2019: Unit 3 & 4 Examples answered


DATA ANALYSIS (CORE) – UNIVARIATE DATA

CHAPTER 1: Displaying and Describing Data Distributions

Exercise 1A: Classifying Data

Variables
In statistics, we call quantities about which we record information variables.

Types of data: numerical and categorical

DATA

NUMERICAL DATA CATEGORICAL DATA


Variable is a Variable is a word, number or
number that comes symbol arising from a person
from measuring or or object belonging to a
counting. category.

CONTINUOUS DISCRETE NOMINAL ORDINAL


We ask how We ask how Data can not be Data can be
much? many? logically ordered. logically ordered.

Numerical data arises when the information recorded about some variable is a number that
comes from measuring or counting some quantity. Numerical data also comes in two types,
discrete and continuous. In discrete data we count and ask the question “How many?”. In
continuous data we measure and ask the question “How much?”

eg. Numerical discrete – number of people living in your house.


number of hours you watch TV

eg. Numerical Continuous – height, weight, length

Categorical data arises when the information recorded about a variable is a word, number or
symbol arising from classifying a person or object as belonging to a particular category.
(NB: numbers which are categorical cannot have arithmetic procedures eg averages, addition,
subtraction, multiplication, division etc applied to them)

eg. Place you came in a race: first, second, third


Favourite ice cream
Type of pet

1
Further Mathematics: Univariate Data Ringwood SC 2019

Example A1
Classify each of the following as categorical or numerical data. If the data is numerical, further
classify the data as discrete or continuous. If it’s categorical, classify the data as ordinal or
nominal.

DATA TYPE
TIME SPENT IN SHOWER Numerical, Continuous

SALARY Numerical, discrete


WEIGHT (1 = underweight, 2 = normal weight, Categorical, ordinal
3 = overweight)
AGE GROUP (teens, twenties or thirties) Categorical, ordinal

EYE COLOUR Categorical, nominal

WEIGHT (kilograms) Numerical, continuous

POST CODE Categorical, nominal

Exercise 1A: 1,2,3,4,5,6

Exercise 1B: Displaying and Describing Categorical and Numerical Data

Frequency Tables

A frequency table is a listing of the values a variable takes in a data set, along with how often
(frequently) each value occurs. It can be used for both numerical and categorical data.
Frequency can be recorded as a:
 Count: number of times a value occurs, or
 Percent: percentage of times a value occurs.
The tables are set as follows:

Category Frequency Count Frequency percentage

Percentage frequency is calculated as follows:

Frequency
Percentage Frequency   100%
Total Number

2
Further Mathematics: Univariate Data Ringwood SC 2019

Example B1 (Numerical Data)

The family size of 11 preschool children are as follows:

3 3 4 4 5 3 2 4 3 5 3

Display the data in the form of a frequency table. Round percentages to one decimal place.

FAMILY SIZE FREQUENCY FREQUENCY PERCANTAGE


1
2 1 × 100 = 9.1%
11
5
3 5 × 100 = 45.5%
11
3
4 3 × 100 = 27.3%
11
2
5 2 × 100 = 18.2%
11

TOTAL 11

The Mode

In a frequency table, the mode is defined to be the data value (or range of data values) that
occurs most often; that is has the greatest frequency.

It is also called the modal class or category.

Barcharts

The barchart is a display for representing the information


contained in a frequency table containing categorical data in
graphical form.

Constructing a barchart from a frequency table:


 Frequency (or % frequency) on vertical axis
 Variable on horizontal axis
 Height of bar gives frequency (or % frequency)
 Bars are drawn with gaps between them to show each
value is a separate category.

Note: Bar charts have a gap between the vertical axis and the first column and has a gap between
the columns. Frequency is always on the vertical axis.

3
Further Mathematics: Univariate Data Ringwood SC 2019

Example B2 (Categorical Data)


10 voters are asked which party they chose on election day:
Greens, Labor, Labor, Labor, Liberal, Labor, Liberal, Labor, Labor, Liberal
a) Display the data in the form of a frequency table.

GENDER FREQUENCY
Liberal 3

Greens 1

Labor 6

TOTAL 10

b) Display the data from your frequency


table in a bar chart.

Stacked or Segmented Barcharts


In a stacked or segmented barchart, the bars are stacked one on another to give a single bar
with several components.

The vertical axis measures either the frequency or percentage frequency.

Example B3
According to the segmented bar chart below, what percentage of days was Melbourne’s climate
recorded as moderate?

About 65%

4
Further Mathematics: Univariate Data Ringwood SC 2019

Exercise 1B: 2, 4, 6
Exercise 1C: Displaying and describing the distributions of numerical
variables
Grouped Frequency Distribution
Sometimes the only way we can display data is by grouping into categories, for instance ages.
Listing all possible ages would be tedious therefore we might use the intervals: 15-19, 20-24, 25-
29, etc.

Grouping is also used for continuous data such as heights and weights. We generally try to group
our data into between 5 and 10 class intervals.

Discrete data Continuous data


eg 15–19 eg 165–
20–24 170–
25–29 175–

Example C1
A local gym recorded the ages of thirty people who were in a cycle class. The results are as
follows:
32 18 20 22 24 52 45 36 28 27
39 25 24 19 51 20 22 26 25 30
19 18 30 32 17 28 25 19 31 28

Form a grouped frequency table with class intervals of 5


Lowest number = 17
Highest number = 52

Age Tally Frequency Midpoint


15–19 6 17
20–24 6 22
25–29 8 27
30–34 5 32
35–39 2 37
40–44 0 42
45–49 1 47
50–54 2 52

5
Further Mathematics: Univariate Data Ringwood SC 2019

Histograms
A histogram is a way of representing the information contained in a frequency table containing
numerical data in graphical form. Note: In a bar chart, the ‘bars’ are separated by a space and
they are used for categorical data. Histograms have ‘bars’ which touch and they are used for
numerical data. Frequency is shown on the vertical axis of both.

Note: When you are working with grouped data the first number in the class interval is written at the the
start of the bar. Alternatively, if you’re working with discrete (non-grouped) data, the number is written in
the middle of the bar.

Use the information in the frequency table to construct a histogram to display the distribution.

Describing histograms
“SOCS” = Shape, Outliers, Centre, Spread
In describing histograms, we discuss
1. Shape – see below – ignore the outlier when you are determining the shape!
2. ‘Outliers’ (values that are considerably higher or lower than the bulk of the data)
𝑛+1
3. Centre (median is the 2 th value) – need to know how to find these in histograms
4. Spread (range : highest score – lowest score)
**ALL of these features must be discussed when describing a histogram distribution

Shape
Symmetric Distribution Positively Skewed
10
10
Frequency

8
Frequency

5 6
4

0 2

FEATURES: 0
● Single peaked FEATURES:
● Tails off relatively evenly either side ● Tails off to the right
● Mean > Median

Negatively Skewed Bimodal Distribution


10
10
8 8
Frequency

Frequency

6 6

4 4

2
2
0
0
FEATURES: FEATURES:
● Tails off to the left ● Double peaked
● Mean < Median

6
Further Mathematics: Univariate Data Ringwood SC 2019

Outliers
Outliers are values that stand out from the main body of data.
10
8

Frequency
6
4
2
0

Centre
The Centre (using the median) divides histogram into two equal areas.

10
10
8
Frequency

Frequency

6
5 4
2
0 0

n +1
Median location: if “n” is the number of data values, the median is the th value
2

Spread

Spread (using the range) indicates how tight or loose data values in a distribution are clustered.

10
15
Frequency
Frequency

10
5
5
0 0

NOTE : NOTE:
● Data loosely clustered ● Data closely clustered

7
Further Mathematics: Univariate Data Ringwood SC 2019

Plotting Histograms on the calculator

Step 1: In the screen, enter all data values into list1


Step 2: Tap (Alternatively, tap “SetGraph  Setting”)
Step 3: Make sure “Draw” is set to “On”
Select Type: Histogram
XList: list1
Freq: 1
Tap: “Set”
Step 4: Tap
Step 5: “HStart” is the value you want your first interval to begin from. Typically a “round” number that is
equal to or below the lowest value in your data. “HStep” is the size of each interval (usually 5 or 10,
depending on the spread of the data). Select these values then tap “OK”.

Step 6: Tap (Alternatively, tap “Analysis  Trace”) then scroll left and right using to show
values on the histogram.

Writing a Report: Describing Histograms


For the distribution of (variable name) the data is (shape description) with (no outliers OR
outlier(s) at …).

The centre of the distribution, as measured by the median, lies in the interval _________ , and the
spread of the distribution, is ______________ as measured by the range.

8
Further Mathematics: Univariate Data Ringwood SC 2019

Example C2
This histogram shows the
distribution of life expectancy
for 183 different countries.

Complete the report below for


this histogram.

For the distribution of life expectancy for these 183 countries, the data is negatively skewed with
no apparent outliers.
The centre of distribution, as measured by the median, is between 65 and 70 years and the spread
of the distribution, is approximately 35 as measured by the range.

Exercise 1C: 1, 2, 3, 5, 7, 8, 9

9
Further Mathematics: Univariate Data Ringwood SC 2019

Exercise 1D: Using a log scale to display data


Significant figures
When rounding to a given number of significant figures, we are rounding to the digits in the
number that are regarded as “significant”.

The rules for significant figures are:


 All digits greater than zero are significant.
 Leading zeros can be ignored (they are placeholders and are not significant) – for
integers & decimal numbers.
 Zeros included between other digits are significant.
 Trailing zeros after decimal digits are significant.
 Trailing zeros for integers are not significant (unless specified otherwise).
Example D1
How many significant figures are there in each of these numbers?

a) 0.003 561 4
b) 70.036 5
c) 5.320 4
d) 5320 3
e) 450 000 2
f) 78 000.0 6
g) 78 000 2

Rounding with Significant figures


As when rounding to a given number of decimal places, when rounding to a given number of
significant figures consider the digit after the specified number of figures.

If it is 5 or above, round the final digit up; if it is 4 or below, keep the final digit as is.

For example:

5067.37 — rounded to 2 significant figures is 5100


3199.01 — rounded to 4 significant figures is 3199
0.004931 — rounded to 3 significant figures is 0.004 93
1020004 – rounded to 2 significant figures is 1000 000
32 – rounded to 4 significant figures is 32.00

Worksheet: Rounding
For extra practice: http://studymaths.co.uk/workout.php?workoutID=62
(Redo until you get 10/10)

10
Further Mathematics: Univariate Data Ringwood SC 2019

Using a log10 scale


Sometimes a data set will contain data points that vary so much in size that plotting them using a
traditional scale becomes very difficult.

A way to overcome this is to write the numbers in logarithmic (log) form. The log of a number is
the power of 10 which creates this number.

log10(10) = log10(101) = 1
log10(100) = log10(102) = 2
log10(1000) = log10(103) = 3

The log of a number is the power of 10 which creates this number.

Worked example
The histogram below displays the body weights (in kg) of a number of animal species. Because
the animals represented in this dataset have weights ranging from around 1kg to 90 tonnes (a
dinosaur), most of the data are bunched up at one end of the scale and much detail is missing.
The distribution of weights is highly positively skewed, with an outlier.

However, when a log scale is used, their weights are much more evenly spread along the scale.
The distribution is now approximately symmetric, with no outliers, and the histogram is
considerably more informative. We can now see that the percentage of animals with weights
between 10 and 100kg is similar to the percentage of animals with weights between 100 and
1000 kg.

11
Further Mathematics: Univariate Data Ringwood SC 2019

Converting values using log base 10

To convert a “real” value into a log scale value:


log10(actual value) = log scale value

To convert a log scale value back into a “real” value value:


10 ^ (log scale value) = actual value

Using CAS for logs


To convert a single data value:

Actual value to log scale value: (use ) Log scale value to actual value

To convert an entire set of data


Use the Statistics screen as shown:

12
Further Mathematics: Univariate Data Ringwood SC 2019

Example D2
The Richter scale is a log base 10 scale used to measure the size of earthquakes.
a) An earthquake is recorded with a raw value of 75,000. What is this value on the Richter scale,
correct to 3 significant figures?
Log10(75000) = 4.875061… = 4.87 (to 3 s.f.)

b) What is the raw value, correct to 4 significant figures, of an earthquake which is recorded at
6.3 on the Richter scale?
106.3 = 1995262.315 = 1 995 000 (to 4 s.f.)

c) Show that an earthquake that measures 3.0 on the Richter scale is ten times stronger than an
earthquake that measures 2.0 on the Richter scale.
102 = 100
1000
103 = 1 000 = 10 therefore 10 times stronger.
100

Example D3

a) positive skew

b) approximately symmetric

Exercise 1D: 1aceg, 2, 3, 4

13
Further Mathematics: Univariate Data Ringwood SC 2019

EXAM QUESTIONS

Log10 (10) = 1

There is 1 country above 1.0 on


the log scale histogram.

1÷58 x 100 = 1.72..%  2%

B is correct

Log10 (1) = 0

There are 9+1 = 10 values above


0 on the log scale histogram.

E is correct

Chapter 1 Review: MCQ 1-17, EXT 1-4

14
Further Mathematics: Univariate Data Ringwood SC 2019

CHAPTER 2: Summarising Numerical Data

Exercise 2A: Dot Plots and Stem plots


Dot Plots
A dot plot is a type of graph which consists of a number line with each data point marked by a
dot. It is suitable for small sets of data only. It can be interpreted in the same way as a histogram.

Example A1
The dot plot below shows the number of hours students in a class spend on homework each
week. What is the median number of homework hours per week?

Stem and Leaf Plots (Stem plots)


A stem plot is an alternative to a histogram. It is useful for displaying relatively small sets of data
(less than 50 values) and has the advantage of retaining all the original data values. Like a
histogram the stem plot gives information about the shape, outliers, centre and spread of the
distribution. Remember to include a key.

Note: always check the key!

Example A2
Prepare an ordered stem and leaf plot for the following set of scores.

12 45 67 45 34 54 87 86 80 40 23 48 69 71

**Hint – you can use your CAS to order the data (Stats – List 1 – Edit – Sort ascending).

Key: 1|2 = 12
Stem Leaf
1 2
2 3
3 4
4 0 5 5 8
5 4
6 7 9
7 1
8 0 6 7

Note: The histogram report template (SOCS) can also be used for stem and leaf plots

15
Further Mathematics: Univariate Data Ringwood SC 2019

Split Stems
Some stem plots are too bunched (there are too many numbers in one leaf) and it is therefore
necessary to perform a split stem. The stem is usually split into halves or fifths.

Eg. The stem 2 (representing 20) can be split into

2 ( 20-24) or 2 (20-21)
2 (25-29) 2 (22-23)
2 (24-25)
2 (26-27)
2 (28-29)

Example A3
Construct a single stem, and a stem split into fifths for the following data.

1.5 0.2 1.2 1.3 0.9 1.8 1.9 1.7 0.7 1.6
1.2 1.0 1.6 1.4 1.1 1.5 1.6 1.5 1.7

Stem Leaf
0 2 7 9 Stem Leaf
1 0 1 2 2 3 4 5 5 5 6 6 6 7 7 8 9 0
0 2
0
0 7
0 9
1 0 1
1 2 2 3
1 4 5 5 5
1 6 6 6 7 7
1 8 9

WHEN TO USE WHICH GRAPH:

Graph Type of Data Limit on data size


Bar chart Categorical data

Histogram Numerical data Best for medium to large


data sets (> 40 values)
Stem and leaf plot Numerical data Best for small to medium
data sets (< 50 values)
Dot plot Numerical data Suitable only for small data
sets (< 20 values)

Exercise 2A: 2, 3, 4, 5

16
Further Mathematics: Univariate Data Ringwood SC 2019

Exercise 2B: The median, range and interquartile range (IQR)


Summary Statistics
The 2 most common measures used to summarise a distribution are measures of centre &
spread.

Calculating the Median


The median is a measure of centre. The median is located by listing all the data values in
numerical order and then finding the value that divides the distribution into two equal parts. For
small data sets, once the data is ordered the median can be easily located by the eye.

For example, to calculate the median of the following, write out the data set:
3 5 1 4 8
we firstly write out the data values in numerical order:
1 3 4 5 8
and then locate the midpoint of the data set
1 3 4 5 8
^
median = 4

For an odd number of data values, the median will be one of the data values as above. For an even
number of data values, the median does not coincide with an actual data value. For example, to
locate the median of the data set:
5 3 4 8
we firstly write out the data values in numerical order and then locate the midpoint of the data
set. In this case the midpoint lies halfway between 4 and 5, that is, at 4.5:
3 4 5 8
^
median = 4.5

For n data values, the MEDIAN is located at the:

n +1
position
2

By definition, half the data (50%) lies above the median and half below.

17
Further Mathematics: Univariate Data Ringwood SC 2019

Example B1
Find the median of the following data set:
1 8 7 6 5 4 2 2 3 6

10 scores, ordered: 1, 2, 2, 3, 4, 5, 6, 6, 7, 8  median = 4.5

The Range
The range is a measure of spread.

RANGE = Highest value – Lowest Value


= xmax – xmin

Example B2
What is the range for the data in Example B1?

Range = 8 – 1 = 7

Example B3
Calculate the median and range from the stem and leaf plot below:

27 scores, median score is the 14th  median = 3.1km2


range = 8.4 – 1.5 = 6.9km2

18
Further Mathematics: Univariate Data Ringwood SC 2019

The Interquartile Range (IQR)


The Interquartile range is a measure of spread. Just as the median is the point that divides a
distribution in half, quartiles are the points that divide a distribution into quarters.
We use the symbols, Q1, Q2 and Q3 to represent the quartiles.

Q1 – (the lower quartile)– The median of the lower half of you data
Q2 – the median
Q3 – (the upper quartile) – The median of the upper half of your data

Note –the median is not included in the lower or upper halves when performing
calculations for lower and upper quartiles.

Interquartile Range (IQR) = Q3 – Q1

The interquartile range is a measure of spread of the distribution that describes the range of the
middle 50% of observations.

The IQR is not affected by the presence of outliers. For this reason it is often a more useful
measure of spread than the range.

Example B4
For each of the following sets of data, find:
i. the lower quartile
ii. the upper quartile
iii. the interquartile range

a) 1, 2, 3, 3, 4, 4, 5, 6, 6, 7, 8
Q1 = 3
Q3 = 6 IQR = 6 – 3 = 3

b) 2, 3, 3, 3, 4, | 5, 6, 6, 6, 7
Q1 = 3
Q3 = 6 IQR = 6 – 3 = 3

Summary statistics on the calculator


This can be used to determine the mean, standard deviation, mode, median, minimum, maximum, upper
and lower quartiles.
Step 1: From the menu, go to the Statistics screen
Step 2: Enter all data values into list 1
Step 3: Tap “Calc  One-Variable”
Step 4: Make sure that Xlist is set to “list1” and tap OK.

19
Further Mathematics: Univariate Data Ringwood SC 2019

Example B5
Calculate the IQR of the data displayed in the stem and leaf plot below.

STEM LEAF Key: 15 | 4 = 154


15 4 8 8
16 1 3 3 6 8
17 0 0 1 4 7 9 9|9
18 1 2 3 3 5 7 8 8 9
19 2 7 8
20 0 2
IQR = 188 – 168 = 20

Exercise 2B : 1, 3, 4, 5, 6, 8
Holiday Homework Worksheet: Univariate Data Exam Questions

Exercise 2C: The Five Number Summary and the Box plot

The “five number summary” is:


[Minimum value, Q1, Median, Q3, Maximum value]

The Box Plot


The box plot is a graphical representation of a five number summary (numerical data).
A box plot is a very compact way of clearly displaying the location, spread (median, IQR, range)
and general shape of a distribution.

When constructing a box plot:


 the box represents the middle 50% of scores
 the median is shown by a vertical line drawn within the box
 lines (called whiskers) are extended out from the lower and upper ends of the box to the
smallest and largest data values.
 Each section of the boxplot contains one quarter, 25%, of the data.

Outliers can be shown as a dot or a cross. If there is an outlier, the “whisker” only extends to the
lowest/highest value that is not an outlier.

20
Further Mathematics: Univariate Data Ringwood SC 2019

Relating a box plot to the shape of a distribution


We can describe the shape of box plots using the same terms that we used for histograms.

Box plots with outliers


To determine whether a value is an outlier, we need to calculate a “lower fence” and an “upper
fence”. Any data value that lies outside these fences is considered an outlier.

Lower fence = Q1 – 1.5 x IQR


Upper fence = Q3 + 1.5 x IQR

On a boxplot, outliers are marked with a dot or a cross.

Example C1
a) Calculate the 5 number summary and then construct a box plot for the following data. Use and
label an appropriate scale.
36 35 34 32 37 35 38 32 35 37
ordered : 32, 32, 34, 35, 35, 35, 36, 37, 37, 38
min = 32, Q1 = 34, med = 35, Q3 = 37, max = 38

b) describe the box plot in terms of shape.


Approximately symmetric (although this one is not clear!)

c) calculate the upper and lower fences, and hence state whether there are any outliers.
IQR = 37 – 34 = 3 lower fence = 34 – 1.5 x 3 = 29.5
upper fence = 37 + 1.5 x 3 = 41.5
Since all values are between 29.5 and 41.5, there are no outliers.

21
Further Mathematics: Univariate Data Ringwood SC 2019

USING CAS to draw boxplots

Step 1: In the screen, enter all data values into list1


Step 2: Tap (Alternatively, tap “SetGraph  Setting”)
Step 3: Make sure “Draw” is set to “On”
Select Type: “MedBox”
XList: list1
Freq: 1
Show Outliers (always make sure this box is ticked!)
Tap “Set”
Step 4: Tap

Step 5: Tap (Alternatively, tap “Analysis  Trace”) then scroll left and right using to show
values on the box plot

DESCRIBING THE DISTRIBUTION OF A BOXPLOT


When describing box plots we are typically asked to refer to shape, center, and spread, as well as
the presence of any outliers.

“SOCS” = Shape, Outliers, Centre (median), Spread (range & IQR)

Report template

The distribution is approximately symmetric/positively skewed/negatively skewed


with outlier(s) at _____ (or “with no outliers”).
The centre of the distribution, as measured by the median, is ________.
The spread of the distribution, as measured by the IQR, is ______ .

22
Further Mathematics: Univariate Data Ringwood SC 2019

Comparing box plots


We can also be asked to make comparisons between multiple box plots. These are called
“parallel” box plots when drawn on the same scale (these are explored further in Bivariate Data)

Example C2
Using the box plots shown, answer
the following questions:
a) In which month was the
temperature generally higher?
May

b) Compare the distributions for July


and May in terms of center and
spread.
The center, as measured by the
median, is higher for May
(approximately 14.50C) than July
(approximately 90C).

The spread, as measured by the interquartile range, was greater in May (approximately 6.50C)
than July (approximately 30C)

Exercise 2C: 1, 2, 3a, 4, 6, 7, 8, 9


Exercise 2D: 1
Exercise 2E: 1, 2, 3

Exercise 2F: Describing the centre and spread of symmetric distributions


The Mean (average)
The mean is also known as the average. It is found by adding together each value, then dividing
by the number of values.
𝑠𝑢𝑚 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠
𝑚𝑒𝑎𝑛 (𝑥̅ ) =
𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠

It is important to note that the mean is affected by outliers.

Mean on the CAS


1. Statistics  Enter data into list1
2. Tap “Calc  One variable  OK”
Note that is the mean

23
Further Mathematics: Univariate Data Ringwood SC 2019

Example F1
Calculate the mean and median of following sets of data.
a) 2 6 8 10 14 15 15
mean = (2+6+8+10+14+15+15) ÷ 7 median = 10
= 10

b) 2 6 8 10 14 15 50
mean = (2+6+8+10+14+15+50) ÷ 7 median = 10
= 15

Mean of Ungrouped and Grouped Data


Sometimes we may be required to calculate the mean from data that is grouped (e.g. in a
frequency table, histogram or barchart)

Example F2
Use the “mid point” method to find the mean from each of the following frequency tables:

a) Complete the table below assuming that the data variable is discrete, then calculate the mean
from the table.

class mid – freq. xf


interval point Mean = (479 + 838.5 + 1592.5 + 2157) ÷ (2 + 3 + 5 + 6)
= 5067 ÷ 16
220 – 259 239.5 2 479 = 316.6875
260 – 299 279.5 3 838.5
300 – 339 318.5 5 1592.5
340 – 379 359.5 6 2157

b) Complete the table below assuming that the data variable is continuous, then calculate the
mean from the table.

class mid – freq. xf


interval point Mean = (240 + 1960 + 1600 + 1440) ÷ (1 + 7 + 5 + 4)
220 – 240 1 240 = 5240 ÷ 17
260 – 280 7 1960 = 308.235 (if rounded to 3 d.p.)
300 – 320 5 1600
340 – 360 4 1440

24
Further Mathematics: Univariate Data Ringwood SC 2019

Choosing between the mean and the median


The mean and median are both measures of the centre of the distribution.
If the distribution is:
 Symmetric and there are no outliers, either mean or median can be used to indicate
the centre of the distribution.
 Skewed and/or there are outliers, it is more appropriate to use the median to indicate
the centre of the distribution.

Exercise 2F-1: 1, 3, 4, 5, 7, 8
Worksheet: Mean from grouped data

The Standard Deviation, Sx


 The standard deviation, s, measures the spread of data values around the mean.

 ( x – x )2
s =
n – 1
 The variance, is the square of the standard deviation, s2, and is also a measure of spread
e.g. Variance = 9, SD = √9 = 3

 If the standard deviation is small compared to the mean eg mean = 100 and s.d.= 3 then
there is a small spread of data.
 If the distribution is highly skewed, or if there are outliers, then the IQR is a better
measure of spread than the standard deviation.

Finding the standard deviation on the CAS (this is the only way you need to know!)
1. Statistics  Enter data into list1
2. Tap “Calc  One variable  OK”
Note that is the standard deviation

Example F3
Use your calculator to find the standard deviation of the following data set, correct to three
significant figures:
76, 75, 79, 69, 80, 74, 83, 66

Sx = 5.6505373…  5.65

Exercise 2F-2: 1, 2, 3, 4, 6

25
Further Mathematics: Univariate Data Ringwood SC 2019

2G: The normal distribution and the 68-95-99.7% Rule

For any distribution approx.. 95% of the data lies within 2 Standard deviations of the mean

For any NORMAL (bell-shaped) distribution, approximately:

 68% of the observations lie within one standard deviation of the mean
 95% of the observations lie within two standard deviations of the mean
 99.7% of the observations lie within three standard deviations of the mean

mean -1SD mean+1SD

mean – 2SD mean + 2SD

mean – 3SD mean + 3SD

26
Further Mathematics: Univariate Data Ringwood SC 2019

Example G1
The number of matches in a box is not always the same. When a sample of boxes was
studied it was found that the number of matches in a box approximated a normal (bellshaped)
distribution with a mean number of matches of 50 and a standard deviation of 2.
What percentage of boxes would be expected to have more than 48 matches?
48 is 1 s.d. below the mean
Therefore 16% of distribution is below 48, which means 84% is above 48.

Example G2
VCE study scores are normally distributed with a mean of 30 and standard deviation of 7.

a) What score would be needed to be in the top 16% of the state?


16% of distribution is more than 1 s.d. above mean  therefore 37 or more is needed to be in top 16%.

b) What percentage of students are expected to score between 23 and 44?


23 is 1 s.d. below, 44 is 2 s.d.s above  34% between 23 to 30, 47.5% between 30 to 44
34% + 47.5% = 81.5%, therefore 81.5% expected to score between 23 and 44.

c) In a class of 25 students, how many would be expected to score between 30 and 37? Answer to
the nearest whole number
34% of scores between 30 to 37  34% of 25 = 8.5  therefore 9 students.

27
Further Mathematics: Univariate Data Ringwood SC 2019

Example G3
IQ scores are normally distributed. Given that 95% of IQ scores lie between 70 and 130, find the
mean and standard deviation of IQ scores.

95% of normal distribution lies within 2 s.d. either side of the mean.
So 70 is 2 s.d. below mean, 130 is 2 s.d. above mean.
Therefore, mean = 100 (midpoint of 70 and 130) and s.d. = 15 (30 ÷ 2)
Mean = 100
Standard deviation = 15

Exercise 2G : 1, 2, 3, 5

Exercise 2H: Standard Scores


These are also known as ‘z-scores’, and allow comparisons of distributions with different means
and standard deviations.

The standard score is standard score = data value – mean


Standard deviation

or xx
z
s

Standard scores can be positive or negative:


 Positive z-scores indicate the data is above the mean
 Negative z-scores indicate the data is below the mean
 A zero z-score indicates that the data is equal to the mean

*z – score = 1. Score is 1 SD above the mean.

Example H1
In an IQ test, the mean IQ is 100 and the standard deviation is 15. Dale’s test results give
an IQ of 130. Calculate this as a z-score. Interpret this information.
130−100
𝑧= =2
15

Dale’s score is 2 standard deviations above the mean (which puts him in the top 2.5% of the population)

28
Further Mathematics: Univariate Data Ringwood SC 2019

You may also be asked to use the standard score formula to determine an “actual” value.
In these cases, “Action  Advanced  Solve” can be used on the calculator.

Example H2
The length of ants in a colony are normally distributed with a mean of 4.8mm and standard
deviation of 1.2mm.
An ant with a standardized length z = –0.5 corresponds to what actual length?
𝑥−4.8
−0.5 =  solve  x = 4.2mm
1.2

Using standard scores to compare performance


Standard scores can be used to compare performance in two different distributions with
different means and standard deviations.

Example H3
A student obtained the following marks in two exams:
Subject Mark Mean Std Dev

Psychology 75 65 10
Statistics 70 60 5

In which subject did she do better? Show your calculations.


75−65
Psychology standard score: 𝑧 = =1
10

70−60
Statistics standard score: 𝑧 = =2
5

Using standardized scores, this student performed better on the Statistics test.

Exercise 2H : 1ace, 2ace, 3, 4

29
Further Mathematics: Univariate Data Ringwood SC 2019

2I : Populations and Samples

 A population, in statistics, is a group of people (or objects) to whom you can apply any
conclusions or generalisations that you reach in your investigation.

 A sample, in statistics, is a smaller group of people (or objects) who have been chosen
from the population and are involved in the investigation.

 A simple random sample SRS is a random selection from the population such that every
member of that population has an equal chance of being chosen in the sample and the
choice of one member does not affect the choice of another member (using your CAS).

Simple Random Sample (SRS) on your CAS:

To generate a list of random values on the calculator:

Step 1: In the Main screen, tap “Keyboard   Catalog”

Step 2: Tap on the letter “R” and then tap on “ randList( ”twice

Step 3: Input values in the following format:


randList (number of selections, minimum value, maximum value)

For example, to select 3 random numbers between 1 and 20, type: randList(3,1,20)

NOTE: If you are given a list of data and have to find a random sample, you need to number your
data 1, 2, 3, etc. The numbers given on your calculator indicates the POSITION of the real
values of the random sample in the data, they are NOT THE REAL VALUES OF DATA.

Example I2
The following data represents the ages of 20 people in an aerobics class. Find a random sample of
8 people.

42 17 18 36 19 22 25 21 20 38 33 30 16 19 25 25 26 25 22 17
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
a) Assign a number from 1 to 20 to each data value.

b) Use your calculator to select a simple random sample of 8 people from this class. Write
down the ages of the 8 people in the sample.

Randlist(8, 1, 20) 
for example: { 7, 3, 18, 15, 12, 6, 9, 10 }  Ages: 25, 18, 25, 25, 30, 22, 20, 38

Exercise 2I: 1, 2, 3 (from online book, or on Compass)

Chapter 2 Review: MCQ 1-29, EXT 1-5

30

You might also like