Data Rep Slides

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

10/1/2021

Contents

Types of Data

AMEASURES OF LOCATIONSPREAD
1
DISPLAYING DATA ASTEM AND LEAF
1
ABOX-AND-WHISKER PLOTS
1
AHISTOGRAMS
1
ACUMULATIVE FREQUENCY
1
A
1

Discrete and continuous data Measures of …

Numerical data can be discrete or continuous.


Measures of Location Measures of Spread

Discrete data can only take certain values.


Measures of Central Tendency Range Standard
Maximu Deviation
For example: shoe sizes Minimum m
Media
Interquartile
the number of children in a class Mode n Quartiles
Mean Range
the number of sweets in a packet. Percentile Variance
Deciles s
Continuous data comes from measuring and can take any value within a
given range.

Measures of location are single values which describe a position in a


For example: the weight of a banana data set.
Of these, measures of central tendency are to do with the centre of
the time it takes for pupils to get to school the data, i.e. a notion of ‘average’.
the height of 13 year-olds.
Measures of spread are to do with how data is spread out.

1
10/1/2021

The Mean
Mean The mean is the most commonly used average.
The mean is the most commonly used average.
To calculate the mean of a set of values we add together the values and divide
To calculate the mean of a set of values we add together the values and divide
by the total number of values.
by the total number of values.

Sum of values
Mean =
Number of values

For example, the mean time for Class 10B girls is:

12.8 + 14.7 + 15.3 + 15.4 + 15.4 73.6


= = 14.72
5 5

Median Mode
• Important measure of central tendency • A measure of central tendency
• In an ordered array, the median is the
• Value that occurs most often
“middle” number.
For a set of n numbers arranged in ascending • Not affected by extreme values
order: • There may not be a mode
– If n is odd, the median is the middle number.
• There may be several modes
– If n is even, the median is the average of the 2
middle numbers. • Used for either numerical or categorical data
• Not affected by extreme values

2
10/1/2021

Quartiles Interquartile Range Range

• Measure of Variation
• Also Known as Midspread:
Spread in the Middle 50%

• Difference Between Third & First


Quartiles: Interquartile Range = Q3  Q1
Data in Ordered Array: 11 12 13 16 16 17 17 18 21

Q 3  Q1 = 17.5 - 12.5 = 5
• Not Affected by Extreme Values

Variance Standard Deviation


Variance measure of dispersion that is found by averaging the squares of the Standard deviation is a summary measure of the differences of each observation
deviation (distance from the mean) of each piece of data from the mean. Standard Deviation is the square root of the variance
•Calculating Variance

3
10/1/2021

Outliers
Percentiles

Shape •If mean = median = mode, the shape of the distribution is


symmetric.
• Describes How Data Are Distributed •If mode < median < mean, the shape of the distribution
• Measures of Shape: trails to the right, is positively skewed.

• Symmetric or skewed •If mean < median < mode, the shape of the distribution
trails to the left, is negatively skewed.
Left-Skewed Symmetric Right-Skewed

Mean >Median > Mode Mean = Median = Mode Mode >Median > Mean

4
10/1/2021

CHOOSING MEASURES AND DIAGRAM

5
10/1/2021

MEASURE OF SPREAD

6
10/1/2021

Example 9709/06/O/N/08 Q1 Example 9709/06/M/J/12 Q1

Rachel measured the lengths in millimetres of some of the leaves on a tree. Her results are
recorded below.
32 35 45 37 38 44 33 39 36 45
Find the mean and standard deviation of the lengths of these leaves.
[3]

TRY
9709//62O/N/11 Q1 COMBINED SETS OF DATA

The following are the times, in minutes, taken by 11 runners to complete a


10 km run.
48.3 55.2 59.9 67.7 60.5 75.6 62.5 57.4 53.4
49.2 64.1
Find the mean and standard deviation of these times. [3]

7
10/1/2021

COMBINED SETS OF DATA

In general for two sets of data, x and y

Worked example 9709/06/O/N/04 Q4 Worked example 2

The following table below shows the


mean and standard deviation of the
height of 20 boys and 30 girls .

Find the mean and standard


deviation of the height of the 50
children

Mean Standard
deviation

Boys 160 cm 4 cm

Girls 155 cm 3.5 cm

8
10/1/2021

TRY

CODED DATA

Number of birds Mean (kg) Standard deviation


(kg)
Turkeys 9 7.1 1.45
Geese 18 5.2 0.96

9
10/1/2021

Solution

Solution

10
10/1/2021

objectives
1.Draw Stem and Leaf diagram
2.Use the Stem and Leaf diagram to
Stem &leaf diagram find;
The median
The mode
The range
The quartiles

What is stem & leaf diagram? Example 1


The data shows the test scores of
an Algebra test. Draw a stem and
A stem and leaf plot is a method leaf plot of the test scores.
used to organize statistical data Solution:

into a tabular form. The greatest


common place value of the data is
used to form the stem. The next
greatest common place value is Stem Leaf
5 6 9
used to form the leaves. 6 4 6 9
7 0 1 3 6 7 8
8 0 2 2 5 6
9 1 1 2 2 5 8 9

11
10/1/2021

Example 2
The data below shows the history test scores for 15 students.
65, 63, 88, 82, 73, 74, 91, 95, 88, 87, 86, 78, 69, 80, 88 Example 3
Draw a stem and leaf diagram for the data and use it to find;
(a) The median score
(b) The modal score
(c) The range of the scores
Solution:
Stem Leaf Stem Leaf
1st put the scores in numerical order.
6 3 5 9 0 5 6 8 9
63, 65, 69, 73, 74, 78, 80, 82, 86, 87, 88, 88, 88, 91, 95 7 3 4 8 1 2 5 7 8
(a) The median is 82 8 0 2 6 7888 2 0 1 2 6
(b) The mode is 88 9 1 5 3 0 3
(c) Range = 95 – 63 = 32 4 2 5

Example 4
Back to back stem & leaf diagram
Draw a stem and leaf for the data below;
100, 125, 106, 111, 110, 126, 128, 135 This is used to draw the stem and leaf for two
109, 133, 135, 118, 119, 120, 142, 148
data. The two data share the same stem.
Solution:
1st put the ages in numerical order. Stem Leaf
It can be used to compare the two data.
100, 106, 109, 110, 111, 118, 119, 120 10 0 6 9
125, 126, 128, 133, 135, 135, 142, 148 11 0 1 8 9
12 0 5 6 8
13 3 5 5
14 2 8

12
10/1/2021

Example 5
The data below shows the scores in two Math tests. Example
Test 1: {88, 85, 62, 66, 83, 91, 95, 89, 65, 52, 76, 63, 88, 84, 83, 90, 91, 97}
The number of cars parked in a car park at 9 am is recorded for 10 days.
Test 2: {52, 55, 62, 89, 90, 91, 84, 83, 71, 73, 78, 64, 66, 68, 70, 75, 73, 65}
124 130 129 116 132 120 127 107 118 114
Draw a back to back stem and leaf diagram for the test scores.
Complete the stem-and-leaf diagram.
Find;
(a) The range of test 1
Test 2 Stem Test 1 Solution:
(b) The range of test 2
5 2 5 2
(c) Which test is more varied 86 5 4 2 6 2 3 5 6
8533 3 1 0 7 6
9 4 3 8 3 3 4 5 8 8 9
1 0 9 0 1 1 5 7

Solution: (c) Test 1 is more varied because the range is bigger


by 1st ordering the scores;
Test 1: {52, 62, 63, 65, 66, 76, 83, 83, 84, 85, 88, 88, 89, 90, 91, 91, 95, 97}
Test 2: {52, 55, 62, 64, 65, 66, 68, 70, 71, 73, 73, 75, 78, 83, 84, 89, 90, 91}

Example 7

Solution:

BOX-AND-WHISKER PLOTS

13
10/1/2021

What is box whisker plot? Example 1


Draw a box and whisker plot for the data below.
8, 4, 10, 6, 2, 3, 2, 7, 4, 9
This is a diagram that summarises the data using the

Greatest value
Solution:

Upp. Quart.
Low. Quart.
“Five Number Summary” of a data.

Least value
1st put the numbers in order;

Median
The Five Number Summary are;
2, 2, 3, 4, 4, 6, 7, 8, 9, 10
1. The least value of the data
Least value = 2
2. The greatest value of the data
Greatest value = 10
3. The median Median = 5
4. The lower quartile Lower quartile = 3
5. The upper quartile Upper quartile = 8

Example 2
Example 1(different form) The summary of an examination is as follows;
Highest mark = 92
Median = 56
Dots can also be used for the least and Lower quartile = 40
greatest values as; Upper quartile = 85
Range = 56
Draw a box and whisker plot for the data
• • Solution:
Least mark = 92 – 56 =36

14
10/1/2021

Example 3 Example 4
A random sample of 25 Solution: A health worker plotted a box-and-whisker
people recorded the 14 15 17 18 19 plot of the number of days patients stayed
number of glasses of 19 21 22 22 23 in the hospital.
water they drank in a 24 25 26 26 28 Using the diagram, find;
particular week. The 28 30 32 36 38
41 42 45 46 47 (a)The longest days a patient stayed in the
results are shown hospital
below. Draw a box and (b)The lower quartile of the days a patient
whisker plot for the stayed in the hospital
data.
(c)The range number of days.
Solution:
• • (a) 16 days
(b) 3 days
Lowest = 14
(c) Range = 16 – 2 = 14
Greatest = 47
Median = 26
LQ = 20
UQ = 37

Example
Example 6
These box-and-whisker plots
show the monthly electricity costs The stem-and-leaf diagram shows
for 100 different households who the cholesterol count for a group of
use Electro company or Spark 43 people who do not exercise daily.
company. You are given that the lower quartile,
Tom says that the monthly costs median and upper quartile of the
with Electro company are lower cholesterol count for the group are
and vary less than with Spark 5.2, 6.5, and 8.3 respectively. Draw a
company. box and whisker plot for the data.
Is Tom correct? Justify your
answer with reference to the box-
and-whisker plots. Solution:
Solution:
The median is greater in Electro
so Spark is cheaper, therefore
Tom is wrong. Interquartile range
• •
is greater in Spark so Electro is
less varied, therefore Tom is right.

15
10/1/2021

Example 7
The marks of the pupils in a certain class in a History examination are as follows.
28 33 55 38 42 39 27 48 51 37 57 49 33
The marks of pupils in a Physics examination are summarised as follows.
Lower quartile: 28, Median: 39, Upper quartile: 67
The lowest mark was 17 and the highest mark was 74
(a) Draw a box-and-whisker plots for both examinations on the same graph
(b)State one difference, which can be seen from the diagram, between the marks for History
and Physics. HISTOGRAMS
Solution:

History:
Lowest mark = 27
Median = 39 Physics
Lower quartile = 33
Upper Quartile = 50
Greatest mark = 57 History

(b) The Physics marks are more varied.

Histogram often have bars of different widths, so the height of the bar must be
adjusted in accordance with the with of the bar MEAN AND STANDARD DEVIATION FROM GROUPED DATA
Grouped continuous data can be represented on a histogram.

In a histogram, the area of each bar is proportional to the frequency. Time (t minutes) Frequency (f)
In a grouped frequency table you
18 < t ≤ 19 4 do not know the individual data
The bars can have different widths.
There are no gaps 19 < t ≤ 20 11 values so you can only estimate
(The width of a bar is called the class width.)
between the bars the mean.
20 < t ≤ 21 22
The vertical axis represents the frequency density.
21 < t ≤ 22 13

frequency The modal class is the


Frequency density =
class width interval with the
greatest frequency
or density. In a histogram
it is the interval with
Area= Frequency = frequency density  class width the highest bar.

16
10/1/2021

1 The table shows information about the times of 50 runners in a race. 2 The table shows information about the heights of 40 basketball players.
. b Estimate the mean height.
b Estimate the mean time taken.
Time (t Frequency Frequency
Mid-value (x) f×x Height (h cm) Mid-value (x) f×x
minutes) (f) (f)
18 < t ≤ 19 4 18.5 4 × 18.5 = 74 160 < t ≤ 170 2 165 2 × 165 = 330
19 < t ≤ 20 11 19.5 11 × 19.5 = 214.5 170 < t ≤ 180 19 175 19 × 175 = 3325
20 < t ≤ 21 22 20.5 22 × 20.5 = 451 180 < t ≤ 190 15 185 15 × 185 = 2775
21 < t ≤ 22 13 21.5 13 × 21.5 = 279.5 190 < t ≤ 200 4 195 4 × 195 = 780

b Mean = 1019 = 20.38 minutes b Mean = 7210 = 180.25 cm


50 40

1 The lengths of 20 phone calls are recorded. Draw the histogram. 2 The heights of 35 seedlings are recorded. Draw the histogram.

Length, t (min) Frequency Class width Frequency density Height, h (cm) Frequency Class width Frequency density
0≤t<3 6 3 6÷3=2 0≤h<2 7 2 7 ÷ 2 = 3.5
3≤t<5 7 2 7 ÷ 2 = 3.5 2≤h<8 24 6 24 ÷ 6 = 4
5 ≤ t < 10 7 5 7 ÷ 5 = 1.4 8 ≤ h < 10 4 2 4÷2=2

4 4

Add columns to the Add columns to the


table to show the 3 table to show the 3
class widths and to class widths and to
Frequency density

Frequency density
calculate the calculate the
frequency density. 2 frequency density. 2

1 1

0 0
0 2 4 6 8 10 0 2 4 6 8 10
Time (min) Height (cm)

17
10/1/2021

9709/63/O/N/12
4 In a survey, the percentage of meat in a certain type of take-away meal was found.
1 The lengths of 20 phone calls are recorded. Draw the histogram. The results, to the nearest integer, for 193 take-away meals are summarised in the
table.
Length, t (min) Class boundaries Frequency Class width Frequency density (i) Calculate estimates of the mean and standard deviation of the percentage of meat
1≤t<3 0.5 ≤ t < 3.5 6 3 6÷3=2 in these take-away meals. [4]
(ii) Draw, on graph paper, a histogram to illustrate the information in the table. [5]
4≤t<5 3.5 ≤ t < 5.5 7 2 7 ÷ 2 = 3.5
6 ≤ t < 10 5.5 ≤ t < 10.5 7 5 7 ÷ 5 = 1.4

Solution
3
Frequency density
Add columns to the table
to show the, new class 2
boundaries class widths
and to calculate the
frequency density. 1

0
0.5 2.5 4.5 6.5 8.5 10.5
Time (min)

ii.

% of Class
meat boundaries

CUMULATIVE FREQUENCY

18
10/1/2021

From the cumulative frequency table you


The frequency table shows From the frequency table you can can draw a cumulative frequency graph.
information about the length make a cumulative frequency table. Cumulative
Distance (km)
of 20 journeys. cf frequency
20
≤ 20 2
Distance (km) Frequency Distance (km) Cumulative frequency
16
≤ 40 5
0 < d ≤ 20 2 ≤ 20 2
12
≤ 60 14
20 < d ≤ 40 3 ≤ 40 2+3= 5
8 ≤ 80 19
40 < d ≤ 60 9 ≤ 60 2 + 3 + 9 = 14
4 ≤ 100 20
60 < d ≤ 80 5 ≤ 80 2 + 3 + 9 + 5 = 19

distance (km)
80 < d ≤100 1 ≤ 100 2 + 3 + 9 + 5 + 1 = 20 0 20 40 60 80 100

To find the median: Find ½ of the cumulative frequency. 20 ÷ 2 = 10 To find the lower quartile: Find ¼ of the cumulative frequency. 20 ÷ 4 = 5

cf cf
20 20
Read across to the Read across to the
curve and down. curve and down.
16 16

12 12
10
8 8

5
4 4

distance (km) distance (km)


0 20 40 60 80 100 0 20 40 60 80 100
Median = 51 Lower quartile = 40

19
10/1/2021

To find the upper quartile: Find ¾ of the cumulative frequency. 20 × ¾ = 15 To find the inter-quartile range: Find the difference between the upper
quartile and the lower quartile.
cf cf
20 20
Read across to the
curve and down.
16 16 upper quartile
15

12 12
median

8 8
lower quartile
4 4

distance (km) distance (km)


0 20 40 60 80 100 0 20 40 60 80 100
Upper quartile = 62
Inter-quartile range = 62 − 40 = 22

20

You might also like