Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 41

Data

Data – Any information collected is called data


Data types and classification

Q1. Classify each of the following data using two selections from the following descriptive words: categorical,
Numerical, nominal, ordinal, discrete and continuous.
a) The number of students absent from school Numerical and discrete.
b) The types of vehicles using a certain road. Categorical Nominal
c) The various pizza sizes available at a local take away. Categorical Ordinal
d) The room temperature at various times during a particular day. Numerical and Continuous
Q2. Match each word with its correct meaning:
a) Discrete i) placed in categories or classes
b) Categorical ii) counted in exact values
c) Ordinal iii) data in the form of numbers
d) Continuous iv) needs further names to complete the description
e) Numerical v) needs a ranking order
f) Nominal vi) measured in decimal numbers

Q3. Classify each of the following data using two words selected from the following descriptive words:
categorical, numerical, nominal, ordinal, discrete and continuous.
a) The population of your town or city
b) The types of motorbike in a parking lot
c) The heights of people in an identification line-up
d) The masses of babies in a group
e) The languages spoken at home by students in your class
f) The time spent watching TV
g) The number of children in the families in your suburb
h) The air pressure in your car’s tyres
i) The number of puppies in a litter
j) The types of radio program listened to by teenagers
k) The times for swimming 50 metres
l) The quantity of fish caught in a net
m) The number of CDs you own
n) The types of shops in a shopping centre
o) The football competition ladder at the end of each round
p) The lifetime of torch batteries
q) The number of people attending a rock concert
r) Exam grades
s) The types of magazine sold at a newsagency
t) Hotel accommodation rating

Q4. Data representing shoe sizes can be classified as:

A) categorical, nominal B) categorical, ordinal


C) numerical, discrete D) numerical, continuous

Data collection
One of the first decisions to be made when collecting data is to decide from whom, or from what, the
information is to be collected. There are two types of data collection:
 a census

Census: A census involves collecting information from every individual in the whole population.
The population is all the people or objects you want data about. For example, if all new cars are
tested before sale, this is a census. The Australian Bureau of Statistics (ABS) conducts a census of
the entire population of Australia every 5 years.

o A census is accurate and detailed, but also expensive, time consuming and often
impractical.
 a sample survey.

Sample survey: A sample survey involves collecting data from only a part of the population. It is
cheaper and quicker than a census, but not as detailed or accurate. Conclusions drawn from sample
surveys always involve some error. Often this error is due to a bias in the sample or method of data
collection.

o A biased sample does not truly represent the whole population.


o A random sample is fair. It truly represents the whole population.

EXAMPLE: State whether a census or sample survey be used to investigate each situation.

a) the length of time an electric light globe will last


b) the causes of car accidents in NSW
c) the number of people who use Bright Teeth toothpaste

Solution:
a) Sample: It would obviously be impractical to test every light globe produced until it failed—there would be
none to sell!
b) Census: An accurate analysis of all accidents would be important.
c) Sample: It would be very time consuming and expensive to interview the whole population to find out who
uses Bright Teeth toothpaste.
Do the following exercise:

1. State whether a census or sample survey would be used for each of these investigations. Discuss your
answers in groups.
a) the number of goals scored each week by a netball team
b) the heights of the members of a football team
c) the most popular radio station
d) the number of children in an Australian family
e) the number of loaves of bread bought each week by a family

f) the types of pets owned by a class of students

g) the star ratings of different brands of washing machines

h) the numbers of leaves on the stems of plants

i) the amount of sunshine each day

j) the number of people who die from cancer each year

k) the amount of rainfall each month

l) the time spent doing homework each night

m) the countries of origin of immigrants

n) the most popular colours of cars

o) the numbers of pets owned by students in a class

p) the genders of school principals

q) the number of cars passing through an intersection

r) the sports played by students in high schools

s) the stopping distances of cars

t) the marks scored in a class test

u) the items sold at a school canteen

v) the number of matches in a box

w) the amounts raised by school raffles

x) the reasons people use taxis

y) the fuel consumption of different cars


EXAMPLE 2
Suggest the possible bias in each of the following samples.
a) people surveyed by phone during the day
b) people surveyed at a train station
c) people selected from a football crowd d ten people tested with a new drug

Solution:

a) The sample would be biased towards people who are at home during the day and have a phone. It does
not include people who go to work in the daytime or do not have a phone.
b) The sample would be biased towards people who catch the train at that station. It does not include
people who use other forms of transport or do not travel or use a different station.
c) The sample would be biased towards people who attend football matches. For example, there would
probably be more males than females at football matches.
d) The sample would be biased by the characteristics of the ten people. A larger sample is needed.

2. Explain and discuss any possible bias in the following samples.

a) people surveyed by phone on a Saturday night

b) people at a bus stop

c) people in a supermarket carpark

d) people at the beach

e) people in your street

f) businesses selected from the Yellow Pages phone directory

g) people selected from the electoral roll

3. Comment on any possible bias in the following situations.

a) Year 7 students are interviewed about school uniform changes.

b) Motorists stopped in peak hour are interviewed about traffic problems.

c) Real estate agents are interviewed about house prices.

d) Politicians are interviewed about the state of the country’s economy.

e) People are asked to phone in to register their vote on an issue.

f) An opinion poll is conducted by posting a questionnaire to people.

g) A manufacturing company tests a sample of its products every Monday morning.

h) A survey of 20 people indicates that 80% of people watch the Channel 9 News.

i) A company claims that 4 out of 5 dentists recommend their brand of toothbrush.

j) An advertisement claims that ‘Dog breeders recommend Buddy dog food’.

Collecting data for surveys and questionnaires


Great care must be taken when framing questions to collect data by post or by personal interview. It is
common for people to misunderstand the point of a question. The answers to the questions must be in a form
that makes them easy to collate.
Key questions
In any census or sample survey it is very important to ask clear and relevant questions. The questions need to
be structured so that the data obtained is easy to use.
EXAMPLE 1
Year 7 students are conducting a sample survey to investigate people’s eating habits at their evening meal.
Comment on the suitability of the following questions.
a Is the evening meal your main meal of the day?
b Do you eat in front of the television?
c What do you watch?
d How often do you eat take-away food?
e How would you describe your evening meal?
f At what time do you eat your evening meal?

Solution:
a Relevant question, but it should include ‘usually’ to eliminate the once-in-a-while meals that are not typical.
The expected responses would be Yes or No.
b Somewhat relevant question, but it should be reworded to ask the usual location of the evening meal. The
expected responses might include: at the dining table, in front of the TV, etc.
c Irrelevant question. It also assumes that the answer to the previous question was yes.
d Relevant question, but it needs to be reworded. Does it mean in a week, or over a month? Does the question
refer to the evening meal only, or does it include other meals as well?
e Relevant question, but it may be difficult to use the responses. The question needs to include ‘usual’ and be
restructured to include choices like: ‘red meat and vegetables’, ‘chicken and vegetables’, ‘vegetarian’, etc.
f Relevant question, but it needs to include time slots to tick.

Q1: These questions have been suggested for use in a survey about the different methods of transport used by
people travelling to work. Comment on their appropriateness and possible responses. Reword if necessary.
a Do you own a car?
b What colour are trains?
c How often do you drive to work?
d What type of public transport do you use?
e How often do you travel to work by public transport?

Q2 These questions have been suggested for use in a survey about school uniforms. Comment on their
appropriateness and possible responses. Reword if necessary.
a Do you like the present uniform?
b Do you want to wear a uniform?
c What is your favourite colour?
d How old are you?
e Have you attended a school that doesn’t have a uniform?
f Is your uniform comfortable?

Q3 These questions have been suggested for use in a survey of what people watch on TV. Comment on their
appropriateness and possible responses. Reword if necessary.
a Do you own a TV?
b Do you watch TV?
c How many TVs are there in your house?
d What is your favourite program?
e Do you like sport?
f Which is your favourite channel?
g Do you lie down while watching TV?

Types of questions
The main types of questions are:
• Free-response or open-ended questions, like what is your favourite TV program? The person answers the
question in their own words.
• Yes or No questions, like did you watch program X last week? The person answers Yes or No to the
question.
• True or False questions, which are similar to Yes or No questions.
• Tick-box questions, like What do you like watching on TV? Tick one or more boxes.
□ Nothing □ News □ Sport □ Drama □ Comedy □ Soapies □ Other
Options must include all possible responses.
• Scaled-response questions, like Do you think that there should be more Australian programs on TV?
Circle a number.
1 Strongly disagree 2 Disagree 3 Don’t know 4 Agree 5 Strongly agree
The number of options must be odd and there must be a neutral option.

Primary and secondary data


Primary data is data that you collect. There are a number of ways you can collect primary data, such as by:
• Direct observation: This allows you to specifically focus on the details or aspects that are important to your
research.
• Surveys: Considerable amounts of data can be collected by surveys. However, there is an assumption that the
person completing the survey is trustworthy with their responses.
• Interviews: Although time consuming and expensive, interviews allow for in-depth questions and responses.
They also allow interpretation of facial expressions, body language and sarcastic remarks. Again honesty can
be an issue, as the person being interviewed may feel inclined to say what they think the interviewer wants to
hear.
• Logs: These can produce a lot of valuable data regarding particular performances over time under various
conditions. Examples are complaint logs, transaction logs and fault logs.
Secondary data is data collected from external sources such as the internet, television, radio, newspapers,
magazines, journals, research papers, reviews or by hearsay. It is usually in the form of a ready-made product,
so averages, conclusions, etc. are already calculated and stated.
Using secondary data can save time and costs, but the data can lack originality as it has been collected by
other agencies. It may also not be suitable or relate directly to what is being investigated.
Primary data is more costly, but it is also likely to be more suitable and trustworthy. Secondary data, although
cheaper and relatively easy to obtain, needs to be used cautiously.
Exercise.
1. a) Define primary data.
b) List two positive features of primary data.
c) List two negative features of primary data.
d) List two forms of primary data.
2. a) Define secondary data.
b) List two positive features of secondary data.
c) List two negative features of secondary data.
d) List two forms of secondary data.

3 Consider the illustrations shown below. State whether each is an example of primary or secondary data.
 Survey- The process of collecting data is called survey.
 Displaying/organising data – once the data is collected it can be organised in the form of tables and
graphs.
 The process of organising data in a table is called tabulating.
 The table in which the data is organised is called frequency distribution table.

Displaying data – Frequency distribution table


Once data has been collected, the next step is to organise the data before it is analysed and displayed. One
method of doing this is to use a frequency distribution table.

 A frequency distribution table is a table that displays the frequency (the number of times each piece
of data occurs) for each of the categories of data.
 Tally marks are often used to help record the data in the table.
 Every fifth tally mark is placed through the four preceding tally marks to make counting easier.

Example 1- A census is taken of a year 7 class. The method by which the students travelled to school on a
particular day is recorded below using the code: Walk (W), Cycle (C), Bus (B), Train (T) and Car (M).

W C B W C B B B W B B B C B T C M C B T M M T M M M W C C B

Rearrange this information into a frequency distribution table using a tally column.

Method of travel Tally Frequency


Walk
Cycle
Bus
Train
Car
Total
Q1. The type of vehicle passing through a certain point during a 10- minute period was recorded as below:

Code: Motorcycle (M), Car (C), Bus (B) and Truck (T)

CCMCC TBCCC TTMMC CTBCC CCTMC CCTCM

CCMTT CCCMC CCCCT CMBCC CCCTM TCCCC

Rearrange this information into a frequency distribution table using a tally column.

a) How many cars passed through this point?


b) What was the total number of the trucks?
c) How many more trucks than buses were there?

Q2. A particular class was surveyed to find out the number of pets per household and the data were recorded.
The raw data were: 0, 3, 1, 2, 0, 1, 0, 1, 2, 4, 0, 6, 1, 1, 0, 2, 2, 0, 1, 3, 0, 1, 2, 1, 1, 2.
a) Organise the data into a frequency distribution table.

b) How many households were included in the survey?


c) How many households have fewer than 2 pets?
d) Which is the most common number of pets?
e) How many households have 3 or more pets?
f) What fraction of those surveyed had no pets?

Q3. If the tally is fairly simple, the frequency table may be simplified to two columns. Use the simpler table
at right to answer these questions.

a) How many were surveyed?

b) What score had the lowest frequency?

c) What was the frequency of the lowest score?

d) How many scored at least 8?


e) What fraction of the total number surveyed scored less than 8?
Displaying data – Graphs
Graphs are very helpful when displaying and interpreting information. A graph is a visual way of presenting
data. Different types of graphs can be used for different purposes. There are many types of graph and it is
important to choose the most suitable.

Column graphs
A column graph is useful for comparing facts. The columns provide a visual display for
comparing quantities in different categories. Column graphs help us to see relationships
quickly.
When constructing column graphs, they should be drawn on graph paper and have:
1. a title or name
2. labelled axes which are clearly and evenly scaled
3. columns of the same width
4. an even gap between each column
5. the first column beginning half a unit (that is, half the column width) from the vertical axis.

Example1- The column graph below shows the results of a survey of people in a street asking them their
favourite car colour.
a) The scale of this graph‘s vertical axis is: 1 unit = ---------- people
b) What is the title of the graph?
c) What is represented by horizontal axis?
d) What is the most popular car colour?
e) How many more people preferred white than green?
f) Is the statement “Red is at least twice as popular as blue” true or false?
g) Choose the most correct: “Nobody chose silver as their favourite colour.”
The graph was incomplete. 18 people said that yellow was their favourite colour. Draw your own
column on the graph. Colour all columns.
h) How many people were surveyed?(included the people who chose yellow)
i) What fraction of people preferred white or red cars?

Q1. This is a graph of preferred leisure activities of a Year 7class.

a) How many students preferred sport as a leisure activity?


b) How many students were in the class?
c) Which was the most favoured activity?
d) How many times more popular than reading was watching television?
e) Which two activities are closest in popularity?

Q2. This column graph represents the Jumpin’ Jeans company’s profits.

a) Which year showed the highest profit? How much was it?
b) In which year did losses start? What was the loss that year?
c) What was the profit or loss for 2003?
d) i) Find the total profits and the total losses.
ii) Calculate the company’s overall profit/loss over the
period shown.
a) In which year was the only improvement made and by how much?

Q3. An apple producer records his sales for a 12-week period.


a) How many boxes were sold in the first week?
b) How many boxes were sold in the fifth week?
c) How many boxes were sold in the eighth week?
d) The values for some weeks may be unusual. Which ones might
be unusual? Explain your answer.
e) What might cause unusual values in a graph like this?
f) Does the graph indicate that apple sales are improving?
Explain your answer.
Q4. In their physical education class the girls in a Year 7 class were asked to sprint for 10 seconds. The
teacher recorded their results on 2 different days.
a) Why are there 2 columns for each girl?
b) Which girl ran the fastest on either day?
c) How far did she run on each day?
d) Which girl improved the most?
e) Were there any students who did not improve?
Who were they?
f) Could this graph be misleading in any way?
Explain your answer.
g) Why might the graph’s vertical axis start at 30 m?

Q5. A survey of houses in Statistics Street produced the data shown in the table below.

a) Select a suitable title and draw a column graph to display the Number of bedrooms Number of housed
2 3
data. Label the vertical axis Number of houses and the horizontal
3 10
axis Number of bedrooms.
4 6
b) Find the most common number of bedrooms in the houses of 5 2
Statistics Street.

c) What is the number of houses surveyed in Statistics Street is?

Line Graphs

Line graphs are used to display data or information that changes continuously over time. Line graphs allow
us to see overall trends such as an increase or decrease in data over time.
When constructing line graphs they must be drawn on graph paper and include:
1. a title
2. a horizontal axis that is evenly scaled and labelled (usually as time)
3. a vertical axis that is evenly scaled and labelled
4. a line or smooth curve that joins successive plotted points.
Example1. This table shows the variation in price of a share for 1 week in 2012. The prices were taken at the
close of trading at 4 pm.

a) Draw a line graph to show this information.


b) Using the graph, approximate the share value halfway through trading on Tuesday

a) Put time (day of the week) on the horizontal axis and price on the vertical axis.
Step 1: Mark an appropriate scale for the horizontal axis.
Step 2: Mark an appropriate scale for the vertical axis.
Step 3: Write a heading and label each axis.
Step 4: Plot the points and join them with straight lines.
b) Step 1: Locate the position ‘halfway through trading
on Tuesday’ on the horizontal axis.
Step 2: Rule a line up to the graph, then across to the
vertical axis.
Step 3: Read the value off the vertical axis: about 46 cents.
1. The height of a seedling was measured at the same time each day over a week.

a) Draw a line graph for this information.


i) What heading should go on the graph?

ii) What label should go on the horizontal axis?

iii) What label should go on the vertical axis?

iv) The first two coordinates to plot are (0, 4) and (1, 6). Write the remaining coordinates.

b) What was the initial height of the seedling? On what day was this?
c) Using the graph, estimates the height of the seedling after 5.5 days.

2. The weight of a baby at various ages is shown below.

a) Illustrate this information on a line graph.


b) Estimate the baby’s weight at 10 months.

3 This graph shows the outside temperature over a 24-hour period that starts at midnight.
a) What was the temperature at midday?

b) When was the hottest time of the day?

c) When was the coolest time of the day?

d) Use the graph to estimate the temperature at these times of the day.

i 4:00 am

ii 9:00 am

iii 1:00 pm

iv 5:00 pm

4. Oliver measures his pet dog’s weight over the course of a year. He gets the following results.
a) Draw a line graph showing this information, making sure the vertical axis has an equal scale from 0 kg to
10 kg.
b) Describe any trends or patterns that you see.
c) Oliver put his dog on a weight loss diet for a period of 3 months. When do you think the dog started the
diet? Justify your answer.
Example: Interpreting a travel graph

This travel graph shows the distance travelled by a cyclist over 5 hours.
a) How far did the cyclist travel in total?
b) How far did the cyclist travel in the first hour?
c) What is happening in the second hour?
d) When is the cyclist travelling the fastest?
e) In the fifth hour, how far does the cyclist travel?

Solution:
a 30 km
b 15 km
c At rest
d In the first hour This is the steepest part of the graph.
e 5 km

1. This travel graph shows the distance travelled by a van


over 6 hours.
a) How far did the van travel in total?
b) How far did the van travel in the first hour?
c)What is happening in the fourth hour?
d) When is the van travelling the fastest?
e) In the sixth hour, how far does the van travel?

2. This travel graph shows the distance travelled by a cyclist over 5 hours.
a) How far did the cyclist ride in total?
b) How far did the cyclist ride in the second hour?
c) During which hour did the cyclist ride the fastest?
d) For how long did the cyclist rest?

3. How far from home was Luke at:


a) 8 am? b) 9 am? c)10 am?
d)11 am? e)9:30 am? f)11:30 am?
g)10:30 am? h) 8:30 am?
During what times was Luke: resting (or stationary)?
4. Alana left home at 5 am and travelled out of town by bicycle until it got a flat tyre. She had to push the bike
to her uncle’s home, which she reached at 8 am.
How far from the school is:
a)Alana’s home? b) her uncle’s home?
How far did Alana travel between:
c) 5 am and 6 am?
d) 6 am and 8 am?
e) Was Alana travelling faster between 5 am and 6 am or between 6 am and 8 am?
f) When did Alana’s bicycle get a flat tyre?
g) How far did Alana travel altogether?
Other types of Line Graphs
A-The Conversion Graph

B-The step graph


DO the following questions:
1.Refer to the conversion graph above:

2.Refer to the step graph above:

Frequency histograms and frequency polygons


Frequency histogram and frequency polygons are graphical representation of a frequency distribution
table so that patterns can be observed more easily.
For example, the data below is represented as a frequency distribution table, a histogram and a frequency
polygon.’
As a table
Number Frequency
0 57
1 29
2 31
3 61
4 26

■ A frequency histogram is a graphical representation of a frequency distribution table. It can be used when
the items are numerical.
■ The vertical axis (y-axis) is used to represent the frequency of each item.
■ Columns are placed next to one another with no gaps in between.
■ A half-column-width space is placed between the vertical axis and the first column of the histogram.
■ A frequency polygon is formed by joining the centres of each column in the histogram. It begins and ends
on the horizontal axis.
Dot plots
 A dot plot is a simple graphical way to present a small amount of data.
 Each score in the data set is marked with a dot on a number line.
 A dot plot is able to convey information more simply and clearly than a column graph.
 It is especially suitable when there are a large number of categories to be displayed.

Example1- A group of movie critics are asked to give a new movie a rating of between 1 and 5 stars. The
results were 5, 3, 4, 1, 4, 5, 3, 2, 3, 4. Show this information on a dot plot.

Example2 – Over a 2-week period, the number of packets of potato chips sold from a vending machine each
day was recorded:
10, 8, 12, 11, 12, 18, 13, 11, 12, 11, 12, 12, 13, 14.

a) Draw a dot plot of the data.

b) Comment on the distribution.

Q1. The number of goals scored by a soccer team over a season is given below.

0, 2, 3, 1, 2, 5, 4, 1, 2, 0, 2, 3, 1, 1, 1,
a) Display the data in a dot plot.
b) Comment on the distribution.

Q2. Draw a dot plot for each of the following sets of data:
a)2, 0, 5, 1, 3, 3, 2, 1, 2, 3

b) 18, 22, 20, 19, 20, 21, 19, 20, 21

c) 49, 52, 60, 55, 57, 60, 52, 66, 49, 53, 61, 57, 66, 62, 64, 48, 51, 60.

Q3. Melanie played 22 competition basketball games last year. She threw these numbers of goals:
4, 5, 1, 2, 3, 0, 3, 9, 4, 6, 5, 4, 1, 1, 4, 4, 2, 5, 3, 1, 1, 0
a) Draw a dot plot representing this data.

b) Find the total number of goals she threw for the year.
c) Is there an outlier in this data?
Q4. Use the dot plot shown to complete the table.

Stem-and-leaf plots

Each piece of data in a stem plot is made up of two components: a stem and a leaf. For
example, the value 28 is made up of a tens component (the stem) and the units component (the
leaf) and would be written as:

= 28
Example1.Prepare an ordered stem-and-leaf plot for each of the following sets of data:
a) 129, 148, 137, 125, 148, 163, 152, 158, 172, 139, 168, 121, 134.

Ordered stem- and – leaf plots

b) 1.6, 0.8, 0.7, 1.2, 1.9, 2.3, 2.8, 2.1, 1.6, 3.1, 2.9, 0.1, 4.3, 3.7, 2.6.

Q1. The following stem-and-leaf plot gives the age of members of a theatrical group.

a) How many people are in the theatrical group?


b) What is the age of the youngest member of the group?
c) What is the age of the oldest member of the group?
d) How many people are over 30 years of age?
e) What age is the most common in the group?
f) How many people are over 65 years of age?
Q2. The number of errors made each week by 30 machine operators is recorded below:
12, 2, 0, 10, 8, 16, 27, 12, 6, 1,40, 16, 25, 3, 12, 31, 19, 22, 15,7, 17, 21, 18, 32, 33, 12, 28, 31,32, 14.
Prepare an ordered stem-and-leaf plot which displays the data.

Q3. Prepare an ordered stem-and-leaf plot for each of the following sets of data:
a) 1.2, 3.9, 5.8, 4.6, 4.1, 2.2, 2.8, 1.7, 5.4, 2.3, 1.9
b) 207, 205, 255, 190, 248, 248, 248, 237, 225, 239, 208, 244
c) 14.8, 15.2, 13.8, 13.0, 14.5, 16.2, 15.7, 14.7, 14.3, 15.6, 14.6, 13.9, 14.7, 15.1, 15.9, 13.9, 14.5

Sector graphs and divided bar graphs


A sector graph (also called pie chart) consists of a circle divided into different sectors or ‘slices of pie’, where
the size of each sector indicates the proportion occupied by any given item. A divided bar graph is a rectangle
divided into different rectangles or ‘bars’, where the size of each rectangle indicates the proportion of each
item. Both types of graphs are suitable for categorical but not numerical data.

If a student is asked to describe how much time they spend each evening doing different activities, they could
present their results as either type of graph:

■ To calculate the size of each section of the graph, divide the value in a given category by the sum of all
category values. This gives the category’s proportion or fraction.
■ To draw a sector graph (also called a pie chart), multiply each category’s proportion or fraction by 360°
and draw a sector of that size.
■ To draw a divided bar graph, multiply each category’s proportion or fraction by the total width of the
rectangle and draw a rectangle of that size.
Example: Drawing a sector graph and a divided bar graph
On a particular Saturday, Sanjay measured the number of hours he spent on different activities.
TV Internet sport homework
1 hour 2hour 4 hours 3 hours

Represent the table as:


a) a sector graph b) a divided bar graph

DO the following exercise:


Two-way Table
A two-way table of counts organizes data about two categorical variables. Values of the row variable label
the rows that run across the table, and values of the column variable label the columns that run down the
table.
1) Boys and girls were asked what their eye colour was, the results are shown in the two-way table.

Blue eyes Brown eyes Total


Boys 7 12
Girls 9 10
Total
a) Complete the two-way table.
b) How many boys had blue eyes?
c) How many girls had brown eyes?
d) How many people had brown eyes?
2) Students were asked what their favourite subject was, the results are shown below.

History Arts Maths PE Total


Boys 4 2 6 14
Girls 12 6 3 6
Total
a) Complete the two-way table.
b) How many people were asked altogether?
c) How many people enjoyed PE the best?
d) How many girls liked art?
3) 56 students were asked about their eye colours and what their favourite hobby was, but the results
have only been half recorded. They can be found below.

Watching TV Playing sports Socialising Total


Brown 9 4 20
Blue 16 8
Green 1 2 5
Total 13 17
a) Complete the two-way table.

Which Graph should I construct given different types of data??

Activity based on matching types of data with different types of graphs- discuss????

Misleading Graphs
Example 1:
Solution: Graph B has exaggerated the increase in profit by not starting the scale on the vertical axis at zero
and by enlarging this scale.
Graph C has the opposite effect-diminishing the rate of increase by enlarging the horizontal scale.
Graph D, by using a smaller scale on the horizontal axis, gives a different impression again.
Graph E has an irregular scale on the vertical axis. Graph A, C and D are fair, although each gives a different
impression. Graph B and E are misleading.

The main causes of graphs being misleading are:


 The scale on the vertical axis does not start at zero.
 The scale on the vertical axis is irregular.
 The scale on the vertical axis is missing.
 The use of area and volume to create a false impression.
Do the following exercise:
1. Describe the misleading or poor features of the following graphs.
Analysing Data

Range
 It is defined as the difference between the highest and lowest scores.
Range = Highest Score – Lowest score

Mode
 The mode is simply the outcome that occurs the most often, it has the highest frequency.

Median
 After a set of scores has been arranged in order, the median is the ‘middle score’. This is only strictly true
if there is an odd number of scores.
For an even number of scores, the median is the average of the middle two scores.
Mean
 The mean or average of a set of scores is the sum of all the scores divided by the number of scores.
Total of Scores
Mean =
Number of Scores
Example1. Explain which statistical measure is referred to in these statements.
a) The majority of people surveyed prefer Activ-8 sports drink. - Mode
b) The ages of fans at the Rolling Stones concert varied from 8 to 80. - Range
c) The average Australian family has 2.1 children. - Mean
Q1. Explain which statistical measure is referred to in these statements.
a) There was a 15° temperature variation during the day.
b) Children at this school are absent 3.4 days per semester, on average.
c) Most often you have to pay $79.95 for those sports shoes.
d) The average Australian worker earns about $470 per week.
e) A middle-income family earns about $35 000 per annum.
Example2. A class of 20 students scored the following marks (out of 10) in a mathematics test:

5 1 7 6 7 9 8 7 6 3
2 3 5 3 5 4 7 9 7 2
Find
5+1+7+6+ 7+9+8+ 7+6+3+ 2+ 3+5+3+5+ 4+7 +9+7+ 2
Mean = = 5.3
20

Mode = 7

5+6
Median = = 5.5
2

Range = 9 -1 = 8

Q2. Find the range, mean, median and mode for these simple ordered data sets.
a)1, 2, 2, 2, 4, 4, 6 b)1, 4, 8, 8, 9, 10, 10, 10, 12, c)1, 5, 7, 7, 8, 10, 11
d) 3, 3, 6, 8, 10, 12 e) 7, 11, 14, 18, 20, 20 f)2, 2, 2, 4, 10, 10, 12, 14
Q3. For the given data sets, find the:
i) mean ii) median iii) mode iv) range
a) 5,2, 4,1, 0, 6, 1, 2, 9, 6 b) 1,7, 1, 3, 2,6, 1,5, 9,10
Example3. Elio’s batting scores in last year’s cricket series were 65, 30, 0, 0, 0, and 80; while Gaetano’s
scores were 0, 30, 30, 80, 25 and 20 in the same matches.
a) Calculate the mean score for each player.
b) Calculate the median score for each player.
c) Which of the mean and median is the better measure of each player’s ability?
Q4. Frank scored 5, 7, 6, 8, 7 in a series of spelling tests, while Erica scored 8, 8, 6, 1, 9 in the same tests.
a) Calculate the mean for each.
b) Find the median for each.
c) Which is the better measure of their abilities?
Q5. The following scores were made by four teams in sports matches.
Jackals: 4, 0, 5, 9, 4, 8
Panthers: 7, 10, 10, 11, 10, 9
Wallabies: 2, 15, 1, 17, 10, 3
Tigers: 9, 10, 20, 25, 0, 14
a) Which team has the highest mean?
b) Which team shows the greatest range of scores?
c) Compare modal scores for Jackals and Panthers.
d) Find the median score for each team.
Q6. The hours a shop assistant spends cleaning the store in eight successive weeks are:
8, 9, 12, 10, 10, 8, 5, 10
a) Calculate the mean for this set of data.
b) Determine the score that needs to be added to this data to make the mean equal to 10.
Q7. Decide if the following data sets are bimodal.
a) 2, 7, 9, 5, 6, 2, 8, 7, 4 b)1, 6, 2, 3, 3, 1, 5, 4, 1, 9 c)10, 15, 12, 11, 18, 13, 9, 16, 17
Q8.A netball player scored the following number of goals in her 10 most recent games:
15, 14, 16, 14, 15, 12, 16, 17, 16, 15
a) What is her mean score?
b) What number of goals does she need to score in the next game for the mean of her scores to be 16?
Q9. Write down a set of 5 numbers which has the following values:
a) Mean of 5, median of 6 and mode of 7
b) Mean of 5, median of 4 and mode of 8
c) Mean of 4, median of 4 and mode of 4
d) Mean of 4.5, median of 3 and mode of 2.5
e) Mean of 1, median of 0 and mode of 0
Q10.This dot plot shows the frequency of households with 0, 1, 2 or 3 pets.
a) How many households were surveyed?
b) Find the mean number of pets correct to one decimal place.
c) Find the median number of pets.
d) Find the mode.
e) Another household with 7 pets is added to the list. Does this change the median? Explain.
Q11. Eight numbers have a mean of 9. Seven of the numbers are 9, 7, 10, 6, 11, 6 and 10.
Find the eighth number.
Grouped Data
Example4. For set of scores, find the:
i) Mean
ii) Median
iii) Mode
iv) Range.

14+ 40+54+ 80+44


i) Mean = = 9.28 ii) Median is 13th score = 9
25

iii) Mode = 10 iv) Range = 11 – 7 = 4

Q9. Find the mean, median, mode and range of these scores.

a) b)

c)

Example5. Find the median, mode and range of the data presented in the following stem-and-leaf plots.

Mode = 172
Range = 185 – 142 = 43
Q10. Find the median, mode and range of the data presented in the following stem-and-leaf plots.
Clusters, gaps and outliers
Example1. Identify any clusters, outliers or gaps in the following sets of data.
a) Monthly rainfall: 25 mm, 16 mm, 6 mm, 27 mm, 28 mm, 96 mm
96 mm is an outlier. It is much larger than all the other data values. (There is a large gap between 96 and the
other scores.)
b) c)

b) There is a cluster of scores in the ‘fifties’.

c) This data has a gap between 2 and 5. No students made 3 or 4 mistakes.

Q1. Identify any clusters, outliers or gaps in the following sets of data.
a) 13, 14, 15, 15, 17, 104

Example2. a) Find the mean, median, mode and range of each set of scores.
i) 3, 5, 5, 7, 9 ii) 3, 5, 5, 7, 90
29 110
Mean = = 5.8 Mean = = 22
5 5
Median = 5 Median = 5
Mode = 5 Mode = 5
Range = 6 Range = 87
b) Draw a dot plot for each set of data and mark the position of the mean, median and mode.

c) Compare and discuss the use of the mean, median


and mode as measures of central tendency for these data sets.
In the first data set, the mean, median and mode are all central and typical values of the scores.
In the second data set, the mean is no longer a central value as it is larger than 4 of the 5 scores
The two sets of data are the same except for the last score. As the mean is calculated using the value
of every score, it is greatly affected by outliers. When the 9 in the first set is replaced by 90, much
larger than the other scores, the mean changes significantly from 5.8 to 22. The mean is not an
appropriate measure of central tendency if the data has an outlier.
Note: The median and mode remain unchanged despite the presence of the outlier in the second set of
data and are appropriate to use as measures of central tendency.
d) Discuss the use of range as a measure of spread.
i) The range is a good measure of the spread of the scores.
ii) The range is greatly affected by the outlier and is not a useful measure of the spread of this set of
scores.
Q2. a) Find the mean, median and mode of the scores in each data set.
i) 7, 9, 9, 10, 12 ii) 7, 9, 9, 10, 80
b) Draw a dot plot for each set of data and mark the position of the mean, median and mode.
c) Compare and discuss the use of the mean, median and mode as measures of central tendency for
these data sets.
d) Discuss the use of the range as a measure of spread.
Example3. a) The heights of students in a school netball team were measured and recorded as 166 cm,
170 cm, 168 cm, 67 cm, 170 cm, and 169 cm.
i) Calculate the mean, median and mode of this data.
166+170+168+67 +170+169
Mean = = 151.7 Note: The mean is not central or typical of the data.
6
Arrange the data in order: 67, 166, 168, 169, 170, 170
168+169
Median = = 168.5cm
2
Mode = 170
ii) Identify the outlier in this set of data.
Outlier = 67 cm
iii) Ignore the outlier and calculate the mean, median and mode of the remaining 5 scores.
166 cm, 170 cm, 168 cm, 170 cm, and 169 cm.
166+170+168+170+169
Mean = = 168.7
5
Arrange the data in order: 166, 168, 169, 170, 170
Median = 169
Mode = 170
iv) Should the outlier be included when reporting the mean, median and mode for this data? Why or
why not?
In this case it is reasonably obvious that the value 67 cm is a measurement or recording error, as it is
not likely that any girl in the netball team would be 67 cm tall. In this case, the outlier could be
ignored and the mean, median and mode would then all be central and typical of the data.
b) The scores of 5 students on a mechanical aptitude test were recorded as 18, 23, 21, 20, 52.
i) Calculate the mean, median and mode of this data.
18+23+21+20+52
Mean = = 26.8 Note: The mean is not central or typical of the data.
5
ii) Identify the outlier in this set of data.
Arrange the data in order: 18, 20, 21, 23, 52
Median = 21
There is no mode.
iii) Ignore the outlier and calculate the mean, median and mode of the remaining 4 scores.
Outlier = 52
Mean = 20.5
Median = 20.5
There is no mode.
iv) Should the outlier be included when reporting the mean, median and mode for this data? Why or
why not?
In this case the outlier could be the result of one of the students having an exceptionally high
mechanical aptitude compared with the others, so the outlier should be included in the reporting even
though, by including it, the mean is not a central or typical value.
Note: The median is the best measure of central tendency with or without the outlier included.
Q3. A metal rod was measured by 6 students and the results were recorded as 112mm, 111mm, 110 mm, 13
mm, 112 mm, 112 mm.
a) Calculate the mean, median and mode of this data.
b) Identify the outlier in this set of data.
c) Ignore the outlier and calculate the mean, median and mode of the remaining 5 scores.
d) Should the outlier by included in reporting the mean, median and mode for this data? Why or why not?
Q4. The times (in minutes) taken to travel to work in a 5 day week were recorded as 17, 15, 16, 18, 55.
a) Calculate the mean, median and mode of this data.
b) Identify the outlier in this set of data.
c) Ignore the outlier and calculate the mean, median and mode of the remaining 4 scores.
d) Should the outlier by included in reporting the mean, median and mode for this data? Why or why not?
Variation of sample mean and proportion
A key factor in the use of samples is the determination of how large the sample should be in order to give a
good estimate of the properties of the whole population. Consider the following results, recorded in groups of
5, when a normal six-sided die is rolled 200 times. The results are summarised in the table below.
34664, 14624, 31362, 51242, 63611, 42553, 63144, 45213,
56443, 54346, 52415, 33663, 55244, 65132 63514, 62453,
12646, 35236, 24546, 13251, 43356, 64132, 21634, 46323,
55651, 26435, 53142, 25145 26513, 42214, 26563, 21264,
23245, 61224, 32616, 11326, 21621, 16231, 21652, 31643

699
Mean = = 3.5
200

Example1. Consider the proportion of 6s in samples of size 5 from the previous results for die rolls. Compare
these with the population proportion.
Number of 6 s
Proportion of 6s in sample of 5 =
5

Complete the following table.

b)

i) What is the lowest proportion of 6s in these 5 samples?


Lowest = 0%
ii) What is the highest proportion of 6s in these 5 samples?
Highest = 40%
c) In how many of these samples is the proportion in the sample approximately the same as in the population?
There are 2 sample proportions (20% and 20%) that are approximately the same as the population proportion.
d) Do you think that a sample of size 5 is big enough to provide a good estimate of the proportion of 6s in the
population? Give a reason.
No. Only 2 of the 5 samples have proportions close to the population proportion of 18%.
Q1. Consider the proportion of 6s in some samples of size 10 for the data given in the introduction to this
section.
a) Complete the following table.
b) Complete the following.
i) The lowest proportion of 6s in these 5 samples is ___.
ii) The highest proportion of 6s in these 5 samples is ___.
c) In how many of these samples is the proportion in the sample approximately the same as in the population?
d) Do you think that a sample of size 10 is big enough to provide a good estimate of the proportion of 6s in
the population? Give a reason for your answer.

Q2. Consider the proportion of 6s in some samples of size 20 for the data given in the introduction to this
section.
a) Complete the following table.

b) i) What is the lowest proportion of 6s in these 5 samples?


ii) What is the highest proportion of 6s in these 5 samples?
c) In how many of these samples is the proportion in the sample approximately the same as in the population?
d) Do you think that a sample of size 20 is big enough to provide a good estimate of the proportion of 6s in
the population? Give a reason for your answer.
Example2. Combine the information from the 5 samples in Example 1 into one of size 25. Is the proportion of
6s in this sample a good estimate of the population proportion?
Combining the information from our 5 samples:
2+1+0+0+1 4
Proportion of 6s = = = 16%
5+5+5+5+5 25
Yes, this is close to the population proportion of 18%.
Q3. Combine the information from the 5 samples in question 1 into one of size 50. Complete the following to
find if the proportion of 6s in this samples a good estimate of the population proportion.

Proportion of 6s = 10+10+10+10+10 = ___%

Q4. Combine the information from the 5 samples in question 2 into one of size 100. Is the proportion of 6s in
this sample a good estimate of the population proportion?
Example3. Consider the means of samples of size 5 taken from the data at the beginning of Section I and
compare this with the population mean.
a) Complete the table.

b) i) What is the lowest sample mean? = 2.4


ii) What is the highest sample mean? = 4.6
c) In how many of these samples is the mean of the sample approximately the same as that of the population?
Population mean = 3.5
Let’s take ‘within 10% of’ to indicate ‘approximately the same as’.
10
10% of population mean = ×3.5 = 0.35
100
Now 3.5 − 0.35 = 3.15 and 3.5 + 0.35 = 3.85, hence we will consider any sample means between 3.15
and 3.85 to be ‘approximately the same as’ the population mean.
There are two sample means (3.4 and 3.8) that are approximately the same as the population mean.
d) Do you think that a sample of size 5 is big enough to provide a good estimate of the mean of the
population? Give a reason.
No. Only 2 of the 5-sample means are approximately the same as the population mean.
Q5. Consider the means of samples of size 10 and compare this with the population mean.
a) Complete the following table.

b) Complete the following.


i) The lowest sample mean is ___.
ii) The highest sample mean is ___.
c) In how many of these samples is the mean approximately the same as that of the population? Complete the
following:
Number of sample means that lie between 3.15 and 3.85 = ___
d) Do you think that a sample of size 10 is big enough to provide a good estimate of the mean of the
population? Give a reason for your answer.
Q6. Consider the means of samples of size 20 and compare them with the population mean.
a) Complete the following table.

b) i) What is the lowest sample mean?


ii) What is the highest sample mean?
c) In how many of these samples is the mean of the sample approximately the same as that of the population?
d) Do you think that a sample of size 20 is big enough to provide a good estimate of the mean of the
population? Give a reason for your answer.
Example4. Use the information in Example 3 as listed in the table below.
a) Find the mean of the sample means for the first:
i) 3 samples ii) 4 samples iii) 5 samples
Mean of first 3 sample means =
∑ of sample means = 4.6+ 3.4+3.8 = 3.9
3 3

Mean of first 4 sample means =


∑ of sample means = 4.6+ 3.4+3.8+3 = 3.7
4 4

Mean of first 5 sample means =


∑ of sample means = 4.6+ 3.4+3.8+3+ 2.4 = 3.4
5 5
b) Is the mean of the sample means in part a approximately the same as the population mean? As the number
of samples increases, the mean of the sample means gets closer to the mean of the population.
Q7. Use the information in question 5.
a) Find the mean of the sample means for the first:
i) 3 samples ii) 4 samples iii) 5 samples.
b) Is the mean of the sample means approximately the same as the population mean?
Q8. Use the information in question 6.
a) Find the mean of the sample means for the first:
i) 3 samples ii) 4 samples iii) 5 samples.
b) Is the mean of the sample means approximately the same as the population mean?

You might also like