Professional Documents
Culture Documents
MATH210 - Stats Custom Text
MATH210 - Stats Custom Text
Page | 1
6.3 - Estimating Population Proportion for Large and Small Samples 119
6.4 - Hypothesis Testing of Population Mean of a Large Sample 123
6.5 - Hypothesis Testing of Population Mean of a Small Sample 131
6.6 - Hypothesis Testing of Population Proportion of a Large Sample 136
Lab Activities 139
Answers 144
Page | 2
Unit 1: Graphical Representation of Data
Statistics is a collection of methods for planning experiments, obtaining data, and then
organizing, summarizing, presenting, analyzing, interpreting and drawing conclusions based on
data. It allow us to draw inferences about populations by analyzing samples of those
populations.
- organize
- summarize
- present
Plan experiment Collect data - analyze
- interpret
- draw
conclusions
There are different types of data. First, we can distinguish between Quantitative data and
Qualitative data.
Quantitative data is numerical (represents counts or measurements).
Qualitative data is categorical.
For example, ages of election candidates would be quantitative data. The political party
affiliations of the election candidates would be qualitative data. They could be affiliated with
the Liberal Party, Conservative Party, New Democratic Party, or the Green Party. These are
considered to be qualitative data because they are not numerical or quantifiable. Another
example would be collecting data using a survey to determine the number of people supporting
the Liberal Party.
Quantitative data can be further distinguished as either discrete or continuous data.
Discrete data results from either a finite number of possible values or a countable
number of values.
Continuous data results from infinitely many possible values that can be associated with
points on a continuous scale, in such a way that there are no gaps or interruptions.
Page | 3
For example, the number of students studying statistics is discrete data. The number of
students is counted and will always be countable number values as they are whole numbers.
There will never be values such as 2.5 or 3.678 when we discussed number of students in a
class.
Continuous data are usually values that are measured. These include temperature, volume,
time, weights, and heights etc. These data values can be associated with points on a continuous
scale.
Quantitative data that are collected can be also either univariate data or bivariate data.
Univariate data are data that have one variable
Bivariate data are data that have two variables.
This means that if the height of all the students in the class was collected, the data set would be
univariate since height is the only variable. If the height and weight of all the students in the
class are collected, the data set would be bivariate since there are two variables, height and
weight. Collecting bivariate data usually serves the purpose of determining the relationship
between the two variables. For example, collecting data on number of cigarettes smoked per
day and lifespan of the smoker would allow us to determine the relationship between the
amount of smoking and how long a smoker would live.
It is important to understand types of data because appropriate graphical representations of
data would depend on the type of data.
Bar Graph
Bar graphs are appropriate for displaying
frequency in each category.
This bar graph shows the number of students
who had chosen each of the sports as their
favorite sport.
Soccer, softball, basketball, and other sport are
qualitative groups.
The number of students is the frequency in
each qualitative group.
Bar graphs allow a visual comparison among
the different groups.
Page | 4
Pictograms
Pictograms are similar to bar graphs. They
show qualitative categories and the frequency
of each category. Rather than using bars to
show frequency, a picture is usually used to
represent the frequency.
In this example, the pictogram shows that
there are 2000 visitors in January and March
has the highest amount of visitors.
Circle Graph
Circle graphs, also known as Pie Charts, are
appropriate for showing the portion of a
category with respect to a whole.
In this example, each category of the budget is
shown as a section of the circle that represents
the portion it is with respect to the total
amount of the budget.
Scatterplot
Scatterplots are appropriate to display
bivariate data. They are helpful in describing
relationships between the two variables of the
data.
In this case, the two variables are grip strength
and arm strength. Each point represents a
single person the data was collected from. One
could conclude from this visual representation
that in general, the greater the grip strength a
person has, the greater the arm strength they
would have as well.
Page | 5
Line graphs
Practice
c) The distances measured between the target at the dart shot at the target
Page | 6
2) Determine the graphical representation that is the most appropriate for each data set.
d) Mercury levels in fish and mercury levels in the water the fish live in
f) Number of students in each student club at Centennial College with respect to the total
number of students who belong in clubs.
Page | 7
4) Given the following collection of data, complete the following questions:
Page | 8
1.2 – Frequency Polygon
Frequency of data refers to the number of times distinct values occur in a collection of data.
A frequency distribution shows how data are partitioned among several categories or classes
by listing the categories along with the number of data values in each of them
Example 1
Create a frequency distribution table for the following set of data.
12 13 11 9 12 15 16 17 18 20 10
Observing the data set, a reasonable interval is 5. So, begin by creating the following
classes.
Classes Frequency
5-9
10-14
15-19
20-24
Notice that each class is equal in the length of the interval. From 5 to 9, five values are
included. From 10-14, five values are also included. This means that the class width is 5.
From each class, the lower number is known as the lower class limit, while the larger
number is known as the upper class limit. In the class 20-24, 20 is the lower class limit,
24 is the upper class limit. The difference between a lower class limit and the
subsequent lower class limit should always be equal to the class width.
Another important feature is that an upper class limit and the lower class limit of the
next class will never be the same. That is to say, the class after 5-9 will never be 9-14
and it will always begin with 10. This is to avoid data values being sorted into multiple
classes, which will be counted more than once in frequency.
Page | 9
To complete the frequency distribution table, count the number of values from the data
set that will belong into each class.
Classes Frequency
5-9 1
10-14 5
15-19 4
20-24 1
One type of graph that shows the frequency distribution of a data set is the Frequency Polygon.
A Frequency Polygon is constructed by taking the following steps:
1) Create a Frequency Distribution Table
2) Find the Class Mark, the value halfway between the upper and lower limits of each
class.
3) The class marks will be labelled on the x-axis
4) The y-axis will be used to record frequency
5) Plot points to indicate the frequency of each class
6) Connect the points with line segments
Example 2
Create a frequency polygon for the following set of data.
12 13 11 9 12 15 16 17 18 20 10
Classes Frequency
5-9 1
10-14 5
15-19 4
20-24 1
The class mark in the first class, 5-9, is the halfway between 5 and 9. That would be 7,
which can be obtained by finding the average between 5 and 9.
Thus, the class marks are 7, 12, 17, and 22, respectively for each class.
Page | 10
3) Label the x-axis with the class mark and
4) Label the y-axis with the frequency.
Page | 11
Practice
Classes Frequency
0.00-1.99
2.00-3.99
2) Create a frequency distribution for the following data set using a class width of 100.
3) The following set of data was collected from oxygen level readings of a local lake.
Readings are measured in mg/L.
Create a frequency distribution for the following data set by deciding on a
reasonable class width.
Page | 12
5) The following set of data was collected from soil samples to understand diversity of
microorganisms. The collected microorganism DNA was measured in 𝜇𝑔/g of soil.
Create a frequency polygon to represent the frequency distribution of the data.
7.34 2.83 3.62 3.24 2.87
2.93 10.59 1.56 3.51 1.72
11.17 8.19 25.00 19.12 9.55
Page | 13
1.3– Histogram
Example
Create a histogram for the following set of data.
12 13 11 9 12 15 16 17 18 20 10
Classes Frequency
5-9 1
10-14 5
15-19 4
20-24 1
The class boundary between the first two classes is 9.5, since it is the halfway value
between 9, the upper class limit, and 10, the lower class limit of its subsequent class.
Thus, the class boundaries between the classes are 9.5, 14.5, 19.5. For the graph, we
also need to include the lowest boundary and the highest upper boundary. These are
4.5, which is just below the first class, and 24.5, which is just above the last class.
Page | 14
3) Label the class boundaries on the x-axis
These will be 4.5, 9.5, 14.5, 19.5, 24.5
There are no gaps between the bars because the x-axis is representing intervals of
numbers and not categories. It is important not to confuse the histogram with bar
graphs.
Unlike a frequency polygon, the bars of a
histogram show a more rigid shape of the
frequency distribution of the data. It would
be easier to visualize if a curve is drawn
closely above the bars. Using the curve, we
can describe the shape of the frequency
distribution.
Page | 15
Types of Distribution Shapes
Page | 16
Potential Outliers
Observing the frequency distribution is also helpful for identifying potential outliers.
The histogram below is representing the data set to its left.
Since the bar on the right that is away from the cluster of the data, we can estimate that the
data value 70 is potentially an outlier.
Practice
1) State the class boundaries of the following given classes
Classes Frequency
5-9 11
10-14 51
15-19 24
20-24 12
Page | 17
2) The following data set shows students’ test scores in percentage.
79 88 37 91 55
67 80 65 75 77
3) The following data set shows the departure delay times in hours of 16 flights
2 1 2 2 2 3 5 2
11 7 0 3 8 19 4 1
4) The following data set shows the HDL cholesterol levels of 27 adult males
44 33 48 46 62 45 43 39 41
47 46 56 38 37 61 41 56 59
40 75 55 48 50 57 38 41 44
Page | 18
c) Create another histogram using a class width of 20.
d) How do the two histograms compare?
e) Which histogram would you choose to represent the data?
Page | 19
1.4 – Stem-and-Leaf Plot
The third graphical representation of the frequency distribution of a data set is the Stem-and-
Leaf plot.
To create a Stem-and-Leaf Plot, data are also sorted into classes. Unlike, the frequency polygon
and the histogram, the classes in the Stem-and Leaf Plot will not be represented by Class Mark
or Class Boundaries. The classes in the Stem-and-Leaf Plot are represented by the leftmost
digit(s), which will make up the stem. The leaves of the plot will always consist of single digits
only, each taken from the rightmost digit from each data value.
Example 1
The following data set is the test scores in percentage of a class. Create a stem-and-leaf plot
to represent the frequency distribution of the data.
15 76 68 55 40
89 87 73 74 81
1) First, create the stem using the leftmost digit. In this case, the stem will consist of
the tens digit.
2) Each value of the step represents a class. For instance, the 1 of the stem represents
the class 10-19, and the 2 represents the class 20-29, etc.
3) Next, to create the leaves, list the rightmost digit of the data values in its
appropriate class. The leaves will be listed from least to greatest with the lowest
value closest to the stem.
Page | 20
It is important that the leaves are listed with equal spacing among them in order to
represent the frequency distribution accurately.
If we try to visualize a curve drawn over the leaves, we should recognize a frequency
distribution that is skewed left, since there are more values in the higher classes.
Example 2
The following data set is the test scores in percentage of a class. Create a stem-and-leaf plot
to represent the frequency distribution of the data.
125 176 200 123 226
189 142 173 162 181
Because the leaves must consist of only single digits, the values of the stem will
include 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22.
The leaves will be the the ones digit of the data values.
Page | 21
Variations of the Stem-and-Leaf Plot
There are several variations of the stem-and-leaf plot to accommodate data sets. The purpose it
to represent the data in a more meaningful way, to allow a better visualization of the frequency
distribution, or to compare two sets of data.
The first variation is known as the splitting stem-and-leaf plot.
Example 1
The following data set is the weights in pounds of large dogs
79 95 77 78 85 74
76 77 78 80 70 72
With 12 data values and only 3 classes on the stem, it is difficult to see the
frequency distribution with most of the data clustered in the first class, 70-79.
To represent the data’s frequency distribution better, a splitting stem-and-leaf plot
can be used. Creating a splitting stem-and-leaf plot essentially allows the number of
classes to increase by “splitting” the classes.
The new plot shows that the stem consists of 7, 7, 8, 8, 9, and 9, where the classes
are 70-74, 75-79, 80-84, 85-89, etc. The class width of each class is now 5 rather
than 10.
Page | 22
Both the shapes of the frequency distribution are skewed right, but the splitting
stem-and-leaf plot also shows that the highest frequency of data occurs between
75-79.
The second variation is the side-by-side stem-and-leaf plot. This variation allows the
representation of two sets of data, which is convenient for comparison.
Example 2
The two data sets are gas prices recorded over 10 days in 2010 and 2016.
2010:
110.1 112.3 110.4 109.5 111.9
111.4 111.5 109.5 109.7 110.2
2016:
95.7 95.4 96.7 95.2 94.1
93.5 93.5 93.2 92.1 91.0
Page | 23
Observing the side-by-side stem-and-leaf plot, we can conclude the following:
- The data in 2016 has a bimodal frequency distribution.
- The data in 2010 has a skewed right distribution because higher frequencies are found
in the lower classes
- In comparison, gas prices in 2016 are generally lower than gas prices in 2010.
Practice
1) The following data set is the weights in pounds of different models of compact cars.
2779 2795 2600 2775 2780
2775 2800 2753 2680 2560
2) The following data set is the age of college students in their first semester of the
Engineering program.
18 18 17 19 18
19 25 23 21 18
20 19 56 32 19
22 23 26 51 20
Page | 24
3) The following data set is the voltage measurements from a home.
4) The following data sets are test scores collected from two Statistics classes. Students
from Class 1 studied from a hardcopy textbook and students from Class 2 studied from
an electronic textbook.
Class 1:
87 90 25 76 72
73 60 85 82 75
80 79 81 61 70
Class 2:
77 62 50 53 52
76 15 99 90 64
69 70 63 71 50
Page | 25
Unit 2: Descriptive Statistics
Measures of Central Tendency are values that are used to describe the center position of a set
of data. The values include the mean, median, and the mode.
The mean, usually known as the “average”, is the measure of center found by adding the data
values and then dividing the sum by the number of data values there are in a set of data.
Different symbols are used to represent the mean depending if it is calculated from a
population or a sample set of data.
Σ𝑥 Σ𝑥
𝜇= 𝑥̅ =
𝑁 𝑛
The Σ𝑥 symbol refers to the sum of all data values. The value of 𝑁 and 𝑛 refers to the
population size and sample size, respectively. The population or sample size is referring to the
number of data values in the data set.
Where necessary, the mean is rounded to one more decimal place than the original data values.
The median of a set of data is the measure of center that is the middle value when the data
values are arranged in order of increasing or decreasing magnitude.
The following formula computes the location of the median after the data is arranged.
𝑛+1
Location of 𝑥̃ =
2
The mode of a set of data is the value that occurs with the greatest frequency.
Depending on the frequency distribution of the data, in some cases, one measure of central
tendency would be a better representation than another measure.
Page | 26
Example 1
Collected from several households in a city, the following data set shows the weights of paper
recycled in kg per week.
The mean:
𝑥̅ = 10.89625
𝑥̅ =̇ 10.896 𝑘𝑔
The median:
8+1
𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑥̃ = = 4.5
2
The location of the median is the 4.5th value in the sorted data set.
Because the 4.5th value is between the 4th and the 5th value, the median will be
determined by taking the mean of 11.36 and 11.42.
11.36 + 11.42
𝑥̃ = = 11.39
2
The mode:
In this case, each data value occurs only once. There is no mode.
Page | 27
Example 2
The following data set is the age of college students in their first semester of the Engineering
program.
18 18 17 19 18
19 25 23 21 18
20 19 56 32 19
22 23 26 51 20
The mean:
484
𝑥̅ =
20
The median:
20 + 1
𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑥̃ = = 10.5
2
The location of the median is the 10.5th value in the sorted data set.
Because the 10.5th value is between the 10th and the 11th value, the median will be
determined by taking the mean of 10 and 10.
𝑥̃ = 10
The mode:
Page | 28
Mean = 24.2
Median = 20
Mode = 18 and 19
We can described the frequency distribution as skewed right. Data sets that have a
skewed right frequency distribution tend to have a higher mean than median. In this
case, the mean is affected by the extreme data values such as 51 and 56, which
bring the overall average higher. Since the median is location-dependent rather
than value-dependent, the median tends to be a measure of central tendency that
is not affected by extreme values.
Page | 29
Practice
1) For the given data sets, determine all the measures of central tendency.
2) For each of the above data sets, observe the measures of central tendency and
predict the shape of the frequency distribution for each.
Page | 30
2.2 – Measures of Variation
The range of the data values is the difference between the maximum data value and the
minimum data value.
The standard deviation of a set of values is a measure of how much data values deviate away
from the mean.
A slightly different formula is used to calculate the standard deviation of a population
versus the sample.
There are two formulas that could be used to calculate the sample standard deviation.
The first one involves the mean, where the second one does not.
Population Sample
𝑛(Σ𝑥 2 ) − (Σ𝑥)2
𝑠=√
𝑛(𝑛 − 1)
The variance of a set of values is a measure of variation equal to the square of the standard
deviation
Population Sample
Page | 31
Example 1
The following set of data shows the amount of time in minutes students took to complete an
online quiz.
Determine the range, standard deviation, and variance.
12 18 17 10 11
52 12 15 16 10
The range:
The maximum value is 52
The minimum value is 10
𝑅 = 52 − 10
𝑅 = 42
Standard deviation:
First, find the mean of the data set
𝑥̅ = 17.3
Σ(𝑥 − 𝑥̅ )2
𝑠=√
𝑛−1
Observing the formula, next, find the square of the difference between each
data value and the mean.
𝒙 𝒙−𝒙 ̅ ̅) 𝟐
(𝒙 − 𝒙
12 12 – 17.3 = -5.3 28.09
18 0.7 0.49
17 -0.3 0.09
10 -7.3 53.29
11 -6.3 39.69
52 34.7 1204.09
12 -5.3 28.09
15 -2.3 5.29
16 -1.3 1.69
10 -7.3 53.29
Page | 32
The sum of the squares of the difference between each data value and the
mean:
Σ(𝑥 − 𝑥̅ )2 = 1414.10
Σ(𝑥 − 𝑥̅ )2
𝑠=√
𝑛−1
1414.10
𝑠=√
10 − 1
𝑠 = 12.5348403349314
The standard deviation is expressed with rounding of one more decimal place
than the given data.
𝑠 =̇ 12.5
This means that on average, the data values are 12.5 minutes deviating from the
mean of 17.3 minutes.
Variance:
𝑠 2 = (12.5348403349314)2
𝑠 2 =̇ 157.1
Page | 33
Compared to the previous measures of variations, where
𝑅 = 42
𝑠 =̇ 12.5
𝑠 2 =̇ 157.1
Variance is the measure that tends to be most dramatically influenced by potential
outliers.
The measures of variation are useful for describing the dispersion of data. This is because
strictly relying on measures of central tendency as a description of a set of data might not be
enough and might be even misleading in some cases.
Data set 1:
16 18 20
Data set 2:
2 18 34
Both data sets have a mean of 18 and a median of 18. However, just by observation, we can
conclude that Data set 2 is more dispersed than Data set 1. The values in Data set 1 are more
consistent with one another. On the other hand, values in Data set 2 are further apart from one
another. Calculating measures of variation can confirm this observation:
Page | 34
As a measure of variation, standard deviation is particularly useful in helping us identify unusual
values. Unusual values in a data set are those that deviate by 2 standard deviations from the
mean.
Example 2
The following set of data shows the amount of sodium measured in grams found in 5 samples
of canned food products.
The range:
𝑅 = 11.0 − 2.5 = 8.5
The range of the data set is 8.5
Standard deviation:
𝑥̅ = 5.12
𝒙 𝒙−𝒙 ̅ ̅) 𝟐
(𝒙 − 𝒙
3.4 3.4 – 5.12 = -1.72 2.9584
2.5 -2.62 6.8644
5.5 0.38 0.1444
3.2 -1.92 3.6864
11.0 5.88 34.5744
Σ(𝑥 − 𝑥̅ )2 = 48.2280
48.2280
𝑠=√
5−1
𝑠 = 3.47231910975935
𝑠 =̇ 3.47
Variance:
𝑠 2 = (3.47231910975935)2
𝑠 2 =̇ 12.06
Page | 35
If there are any unusual values, they would be 2 standard deviations away from the
mean.
𝑥̅ ± 2𝑠
= 5.12 ± 2(3.47)
= 5.12 ± 6.94
This means that values that are greater than 12.06 or less than −1.82 would be
considered as unusual values. There are no such values in the set of data.
Practice
1) Collected from a local grocery store, the following data set is the sales collected over 6
days. Values are in the unit of dollars in thousands.
23 44 31 32 25 30
2) The following data set is the weight of household organic wastes measured in kilograms
2.73 9.31 3.59 5.36 1.47 7.06 2.52
3) The following data set is the magnitude of earthquakes collected from a sample of 9
earthquakes.
0.27 1.64 1.32 3.44 0.91 1.76 1.01 1.26 0.02
Page | 36
2.3 – Measures of Position
Measures of Position are also known as Measures of Relative Standing. These are values that
expresses the location of the data value relative to all the other values in the same set of data.
In other words, measures of position are used to express the ranking of a data value.
The measures of position that will be discussed in this chapter are Percentiles and Quartiles.
Percentiles are measures of location that divides a data set into 100 groups with about 1% of
the total number of data values contained in each group. The 𝑘 𝑡ℎ percentile, denoted by 𝑃𝑘 , is
the data value that is at a location that separates 𝑘% of the data below and (100 − 𝑘)% of the
data above.
For example, if a student’s test score is at the 90th percentile, it might be considered a very high
test score compare to the class average because that test score has 90% of the scores below it
and 10% of the scores above it. In other words, that student scored better than 90% of his/her
classmates.
Quartiles are measures of location that divides a data set into four groups with about 25% of
the data values in each group. The symbols, 𝑄1 , 𝑄2 , 𝑄3 represent First Quartile, Second Quartile,
and Third Quartile.
The First Quartile is the data values that is at a location that separates 25% of the data below
and 75% of the data above. So 𝑄1 is equivalent to 𝑃25 .
The Second Quartile is the data values that is at a location that separates 50% of the data below
and 50% of the data above. So 𝑄2 is equivalent to 𝑃50 .
The Third Quartile is the data values that is at a location that separates 75% of the data below
and 25% of the data above. So 𝑄3 is equivalent to 𝑃75 .
Thus, the formula to determine the location of the quartile is the same as above.
The Interquartile Range is the range defined by the difference between the Third and the First
Quartile
𝐼𝑄𝑅 = 𝑄3 − 𝑄1
Page | 37
Example 1
The following set of data is the final marks of students in a Statistics class.
79 88 37 91 55
67 80 65 75 77
Page | 38
c) Determine the mark that is the First Quartile
The location of 𝑄1
25
𝐿= (10)
100
𝐿 = 2.5
𝐼𝑄𝑅 = 𝑄3 − 𝑄1
𝑄1 = 65 from the above calculations.
Finding 𝑄3 :
75
𝐿= (10)
100
𝐿 = 7.5
So the 𝐼𝑄𝑅 = 80 − 65
𝐼𝑄𝑅 = 15
1) If 𝐿 is a whole number, then 𝑃𝑘 is midway between the 𝐿𝑡ℎ value and the next value in
the data set that is sorted from least to greatest
2) If 𝐿 is not a whole number, then 𝐿 is rounded up to the next whole number. 𝑃𝑘 will be
located at the rounded value of 𝐿.
Page | 39
Why is the Interquartile Range useful?
Because the 𝐼𝑄𝑅 is also a range, it can be used to help describe the dispersion of data. But
more importantly, the 𝐼𝑄𝑅 can be used to identify outliers.
That means, any values from the data set that is less than 42.5 or greater than 102.5 would be
outliers.
The data set was
37 55 65 67 75 77 79 80 88 91
Practice
3) Referring to the data set in Question 1, determine, if any, the outliers of the data set.
Page | 40
4) The following data set is the price of gasoline (cents/L) collected from various cities in
Canada in February 2016:
St. John's 89.7
Charlottetown 86.7
Halifax 86.2
Saint John 86.9
Québec 90.1
Montréal 98.1
Ottawa 85.1
Toronto 89.8
Thunder Bay 87.1
Winnipeg 74.7
Regina 72.5
Saskatoon 74
Edmonton 63.1
Calgary 73.5
Vancouver 105.9
Victoria 98.1
Whitehorse 90.8
a) 𝑃60
b) 𝑄3
c) 𝑃50
d) 𝑄2
e) 𝑃85
f) Which cities have gasoline prices that are below the 20th percentile?
g) Which cities have gasoline prices that are considered to be the outlier?
5) How do the median, the second quartile, and the 50th percentile compare? Explain.
Page | 41
Unit 3: Descriptive Methods for Bivariate Data
Recall from Unit 1 that bivariate data are data that have two variables. For example, if
students were asked the number of hours of studying they did and the final course mark, the
data collected would have two variables: hours of studying and course mark. Often bivariate
data are collected to determine the relationship between the two variables. For example, we
might be wondering whether more hours of studying done relates to higher course mark.
Bivariate data are usually displayed using a scatterplot. We considered the two variables to
have a correlation between one another when the values of one variable are associated with
the values of the other. When plotted into a scatterplot, if the plotted points display a pattern
that can be approximated by a straight line, then the two variables are considered to have a
linear correlation.
Example 1
The following is a set of data of the amount that a TV repair technician charges for his/her service.
There is a $15 flat rate of service charge and a $20 per hour labour charge.
0 15
1 35
2 55
3 75
Page | 42
Example 2
The following is a set of data relating the years of experience an employee has an his/her salary in
thousands of dollars
Years of Salary in
experience Thousands
12 39
16 41
6 33
23 44
27 48
8 34
5 32
19 44
23 46
13 37
16 43
8 37
The difference between Example 1 and Example 2 is that the data values in Example 1 follows a
perfect straight line when graphed. The cost of labour was calculated from the amount of hours
worked. In Example 2, there are some variation in the data values. Its scatter plot shows that
the data values plotted cluster around a straight line. Given such a pattern can be observed, we
can say that the salary of an employee has a linear correlation with the employee’s years of
experience.
A straight line can be drawn to approximate the pattern. This is called a trend line or a line of
best fit. It is drawn as close as possible to every point on the scatter plot.
Page | 43
Both of the above scatterplots show a positive linear correlation because as one variable
increases, the other variable changes the same way.
When comparing Example 1 and Example 2, Example 1 has a stronger linear correlation than
there is in Example 2 because the points are closer to the line of best fit.
Types of Correlation
The following six diagrams show the types of correlation there are. A and B are examples of
negative linear correlation. D and E are examples of positive linear correlation. When
comparing, A and D have a stronger linear correlation than B and E. Finally, we can say that C
and F do not follow a linear correlation as a straight line cannot be used to approximate the
trend of the data.
If the data values cluster around a horizontal line or a vertical line, then there is no
correlation between the variables.
Page | 44
The Line of Best Fit
The line of best fit can be used to provide a visual representation, making it more convenient
for us to decide the type of correlation the data show. However, drawing the line of best fit can
be very subjective. It is an estimation. If the scatter plot is used to predict data, than a hand-
drawn line of best fit will not give very accurate results.
Given a set of bivariate data, the line of best fit is modeled by the equation
𝑦̂ = 𝑏1 𝑥 + 𝑏0
where
𝑥 represents the data value of the variable graphed along the 𝑥-axis
𝑦̂ represents the 𝑦 values on the best fit line corresponding to the data values of the
independent variable.
𝑏1 is the slope of the line of best fit
𝑏0 is the 𝑦-intercept of the line of best fit.
If the slope of the line of best fit, 𝑏1 , is positive, then the data set has a positive linear
correlation. If 𝑏1 is negative, then the data set has a negative linear correlation.
In this scatter plot, a line of best fit is drawn. A point of the line of best fit would have a
coordinate of (𝑥, 𝑦̂). Directly above or below that point is the actual data value plotted, which
would have a coordinate of (𝑥, 𝑦). The distance in between, which is 𝑦̂ − 𝑦, is known as the
error. Since the line of best fit should be drawn as close as possible to all the data points, this
means that the location of the line must be chosen where the total error is minimized.
Page | 45
(𝑥, 𝑦̂)
𝑏0 = 𝑦̅ − 𝑏1 𝑥̅
then the error is minimized.
Reminder that 𝑥̅ is the mean of the 𝑥 values, and 𝑦̅ is the mean of the 𝑦 values.
Page | 46
Example 1
Determine the linear regression line for the following data set.
Create a scatterplot and draw the line of best fit.
𝒙 𝒚
2 16
5 15
3 15
8 21
𝒙 𝒚 𝒙𝒚 𝒙𝟐
2 16 32 4
5 15 75 25
3 15 45 9
8 21 168 64
Sum ∑ 𝑥 = 18 ∑ 𝑦 = 67 ∑ 𝑥𝑦 = 320 ∑ 𝑥 2 = 102
(4)(320) − (18)(67)
𝑏1 =
(4)(102) − (18)2
1280 − 1206
𝑏1 =
408 − 324
74
𝑏1 =
84
𝑏1 = 0.880952381 …
𝑏1 =̇ 0.88
Page | 47
Next, calculate 𝑏0
𝑏0 = 𝑦̅ − 𝑏1 𝑥̅
∑𝑦 ∑𝑥
𝑏0 = − 𝑏1 ( )
𝑛 𝑛
67 18
𝑏0 = − (0.880952381) ( )
4 4
𝑏0 = 16.75 − 3.964285715
𝑏0 = 12.78571429
𝑏0 =̇ 12.79
𝑦̂ = 0.88𝑥 + 12.79
The equation of the line of best fit shows that the 𝑦-intercept is 12.79. Plot 12.79 as the
first point on the 𝑦-axis. Find the next point by substituting another 𝑥 value to find 𝑦̂.
The two variables from this data set have a positive linear correlation with one another.
Page | 48
Appropriateness of linear regression
Sometimes data do not follow a linear tendency and the points on the scatter plot do not
cluster about a straight line with a nonzero slope. When using the method above to determine
the Regression line in these kinds of situations, we will find a line that is not going to be very
helpful in predicting the 𝑦 values for a given 𝑥 value. To distinguish between appropriate and
inappropriate situations for using the linear regression we will later introduce the linear
correlation coefficient.
Two examples are shown below. For the suggested curve relationship, we can still find the line
of best fit, but the predictions for 𝑦 will be inaccurate.
Practice
1) Determine the linear regression line for the following sets of data.
Using the equation, state the type of linear correlation for the set of data.
a) .
𝒙 𝒚
12 6
13 7
8 4
14 8
20 10
Page | 49
b) .
𝒙 𝒚
11 9
14 10
18 8
7 14
9 11
2) The following set of data shows the amount of recycling in kilograms collected from
households
a) Predict the correlation between the amount of paper recycled and the amount of
plastic recycled by households.
d) Describe the relationship between the amount of paper recycled and the amount of
plastic recycled by households.
Page | 50
3) The following set of data shows the weight of a vehicle (in 1000 lbs) and its fuel
efficiency (in miles/gallon).
Weight of vehicle Fuel efficiency
(1000 lbs) (miles/gallon)
3.56 32
2.70 33
3.92 25
3.25 34
4.07 25
2.46 38
3.55 27
c) Describe the relationship between the weight of a vehicle and its fuel efficiency.
Page | 51
3.2 – The Linear Correlation Coefficient
In the previous chapter, we discussed the different types of linear correlations. We can describe
a linear correlation by stating whether it is positive or negative. Also, we can describe it by
stating whether it is strong or weak. How strong a linear correlation is depends on how close
the data values are to the linear regression line. But, how close do they have to be in order to
be considered as a strong correlation? Using our observations of a scatter plot and the linear
regression line is very subjective.
Thus, the Linear Correlation Coefficient is used to measure the strength of the linear
correlation between the 𝑥 and 𝑦 values in a set of data.
where
𝑟 represents the linear correlation coefficient;
𝑥 and 𝑦 represent the values of the independent and the dependant variables;
𝑥̅ 𝑎𝑛𝑑 𝑦̅ represent the mean of the 𝑥 and 𝑦 values respectively;
𝑛 represents the number of values in the sample;
𝑠𝑥 𝑎𝑛𝑑 𝑠𝑦 represent the values for the sample standard deviation of 𝑥 and 𝑦 set of
values, respectively.
This formula tends to be more convenient because the standard deviations do not need
to be calculated.
Page | 52
1. The value of 𝑟 is −1 ≤ 𝑟 ≤ 1.
Page | 53
Example
Determine the linear correlation coefficient for the following data set
Hours of Test
studying score
0 50
0 46
1 73
2 72
2 88
Page | 54
Coefficient of Determination
When 𝑟 is calculated, we can say that for 𝑟 = 0.91, it is a strong linear correlation. But if 𝑟 has a
value of 0.70, it is hard to decide if the correlation is strong enough. This question led to the
introduction of a new coefficient called the coefficient of determination.
To understand the coefficient of determination, we need to introduce some new terms:
a) The explained deviation is the difference between the value of 𝑦 from the regression
line and the mean value for 𝑦:
𝑒𝑥𝑝𝑙𝑎𝑖𝑛𝑒𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = 𝑦̂ − 𝑦̅
b) The unexplained deviation is the difference between the value of 𝑦 and the value of 𝑦
from the regression line:
𝑢𝑛𝑒𝑥𝑝𝑙𝑎𝑖𝑛𝑒𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = 𝑦 − 𝑦̂
The closer the points are to the line, the greater the proportion of explained variation.
The proportion of explained variation is the square of 𝑟:
𝑒𝑥𝑝𝑙𝑎𝑖𝑛𝑒𝑑 𝑣𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 𝑆𝑆𝑅
𝑟2 = =
𝑡𝑜𝑡𝑎𝑙 𝑣𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 𝑇𝑆𝑆
Page | 55
The coefficient of determination is a better measure because it represents the proportion of
variation in the dependent variable that can be explained by the variation in the independent
variable.
Practice
1) Determine the linear correlation coefficient and coefficient of determination of the
a) .
𝒙 𝒚
11 19
15 27
16 31
20 29
b) .
𝒙 𝒚
32 5
26 9
28 12
23 16
33 5
2) If the linear correlation coefficient is -0.22 between two variables, what conclusion can
be drawn regarding their relationship?
Page | 56
3) If the coefficient of determination is 0.67 between two variables, what conclusion can
be drawn regarding their relationship?
4) The following set of data shows the weight of a vehicle (in 1000 lbs) and its fuel
efficiency (in miles/gallon).
Page | 57
Unit 4: Discrete Probability Distributions
Sample Space – the collection of all possible distinct outcomes that can occur when an
experiment is performed
Example 1
Tim Hortons, a Canadian coffee franchise, launched the campaign, “Roll Up the Rim”, for
customers to have a chance to win prizes with the purchase of a beverage. It is stated that “the
odds of winning a prize are approximately 1 in 6. Tim Hortons audits and monitors the odds
daily to ensure the odds remain the same.
In this scenario, the experiment would be the activity in rolling up the rim of a coffee cup to see
whether or not you have won any prizes. The sample space in general is “winning a prize” or
“please play again”. If we consider the specific outcomes then “winning a prize” would include
“winning a coffee”, “winning a donut”, “winning a car”, etc.
Let’s consider two events in this experiment: “winning a prize” and “not winning”.
Given the odds In winning are 1 in 6, this means that the probabilities of winning and not
winning are as follows:
1 1
𝑃(𝑤) = 7 and 𝑃(𝑤
̅) = 7
Page | 58
Example 2
2) Determine the sample space in an experiment of rolling two dice where the order
of the outcome does matter
In this case, we will use (𝑎, 𝑏) to denote the 𝑎 as the outcome from the first roll
and 𝑏 as the outcome from the second roll
(1,1) ; (1,2) ; (1,3) ; (1,4) ; (1,5) ; (1,6)
(2,1) ; (2,2) ; (2,3) ; (2,4) ; (2,5) ; (2,6)
(3,1) ; (3,2) ; (3,3) ; (3,4) ; (3,5) ; (3,6)
𝑆=
(4,1) ; (4,2) ; (4,3) ; (4,4) ; (4,5) ; (4,6)
(5,1) ; (5,2) ; (5,3) ; (5,4) ; (5,5) ; (5,6)
{(6,1) ; (6,2) ; (6,3) ; (6,4) ; (6,5) ; (6,6)}
3) Determine the probability of rolling an even number when rolling one die
Let 𝐴 represent the event of rolling an even number
3 1
𝑃(𝐴) = =
6 2
5) Determine the probability of rolling repeating digits when rolling two dice
Let 𝐵 represent the event of rolling repeating digits
6 1
𝑃(𝐵) = =
36 6
Page | 59
6) Determine the probability of rolling no repeating digits when rolling two dice
𝑃(𝐵̅ ) = 1 − 𝑃(𝐵)
1 5
=1− =
6 6
Random Variables
A random variable is a variable with possible values that are numerical outcomes of a random
phenomenon. A random variable is assigned to an experiment to represent all the distinct
outcomes that could occur. A random variable could be either discrete or continuous.
A discrete random variable is one that can only have distinct, countable values.
A continuous random variable is one that has uncountable and infinite amount of values.
Example
1) A medical equipment company is collecting data on its sales every day between April
and May. Prior to April, a clinic had already committed to purchasing 10 units per day
from the company. Assign a random variable to represent the possible outcomes.
Let 𝑋 represent the amount of units sold every day between April and May.
Because 𝑋 could be any amount greater than or equal to 10, 𝑋 ≥ 10
This random variable, 𝑋, is a discrete random variable because the values of 𝑋 are
countable values. For example, 𝑋 can be 10, 11, 12, 13… and so on.
Page | 60
2) A meteorologist is collecting data on the daily highest temperature between May and
June.
Assign a random variable to represent the possible outcomes.
Let 𝑋 represent the daily highest temperature between May and June.
Because 𝑋 could be any value and there are infinite values, 𝑋 is a continuous random
variable. In other words, 𝑋 could be any value from negative to positive. 𝑋 could be
1.1𝑜 𝐶, 1.2𝑜 𝐶, or 1.21𝑜 𝐶, and so on. This shows that there are infinite possible values.
Practice
Page | 61
4) In a chemistry experiment, the temperature change of a chemical reaction was
recorded. The chemist stated that the chemical reaction is an exothermic reaction,
meaning that the temperature will only increase. This means that the temperature
change will only be positive.
a) Assign a random variable to represent the possible temperature change that
could be recorded.
b) What is the random variable’s possible range of values?
c) Is the above random variable discrete or continuous? Explain.
d) What is the probability for the temperature change of the reaction to be −2𝑜 𝐶?
Page | 62
4.2 – Introduction to Discrete Probability Distribution
We learned from the previous chapter that a random variable can be used to represent the
possible outcomes of an experiment. If we wish to represent the probability of each of the
possible outcomes, we can use probability distribution.
A probability distribution is a description that allows the probability of each value of the
random variable to be represented. This representation can be in the format of a table,
formula, or graph.
We will focus on representing probability distributions using a table and a histogram. Recall
that there are two types of random variable, discrete and continuous. Therefore, there are two
types of probability distributions, discrete and continuous.
Example 1
Using a table and a histogram, represent the probability distribution of rolling a die.
First, create a table. Let 𝑥 be the random variable representing the outcomes from rolling a die.
𝒙 𝑷(𝒙)
1 1
6
2 1
6
3 1
6
4 1
6
5 1
6
6 1
6
Page | 63
Because the random variable is a discrete one, this is a discrete probability distribution.
The histogram:
0.18
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
1 2 3 4 5 6
Observing the histogram, we can conclude that the probability distribution of rolling a
die is a uniform distribution.
Example 2
A survey was given to 100 people to determine how they rate the customer service of a
company. People completing the survey could choose from 1 to 5, a rating of satisfaction.
The survey shows that 23% of the people rated 1, 14% rated 2, 7% rated 3, 30% rated 4, and
26% rated 5.
With the given information, we know that the probability of finding a person who rated
1 is 23%. The remaining probabilities will be determined the same way.
Page | 64
𝒙 𝑷(𝒙)
1 0.23
2 0.14
3 0.07
4 0.30
5 0.26
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
1 2 3 4 5
Observing the probability distribution from the histogram, we can conclude that the
distribution is bimodal.
∑ 𝑃(𝑥) = 1
2) The sum of the area of the bars on the probability distribution histogram is 1
Page | 65
3) Each value of 𝑃(𝑥) is between 0 and 1, inclusively
0 ≤ 𝑃(𝑥) ≤ 1
Practice
1) In an experiment, two coins are tossed, one after another. The results of interest are the
amount of Tails that show up in the outcome.
a) Assign a random variable to represent the amount of Tails in the coin toss results.
b) Create a probability distribution table for this experiment.
c) Create a histogram of the probability distribution.
d) Describe the shape of the probability distribution.
2) A student is guessing a multiple choice questions on a quiz. There are 5 choices for the
question.
a) Create a probability distribution table for this experiment.
b) Create a histogram for the probability distribution.
c) Describe the shape of the probability distribution.
3) In a college survey, students were asked what their preference was for their mode of
learning. The following table shows the results.
Page | 66
a) Suppose we let 𝑥 = {1, 2, 3, 4, 5} represent each of the modes of learning, create a
probability distribution table for this experiment.
b) Create a histogram for the probability distribution.
c) Describe the shape of the probability distribution.
d) Calculate the sum of 𝑃(𝑥).
4) Given the following table based on results from a survey,
𝒙 𝑷(𝒙)
1 0.60
2 0.14
3 0.35
Page | 67
4.3 – The Mean and Variance of a Probability Distribution
The mean and variance of a probability describe the “average” value and the variation of the
random variable, respectively.
The mean of a probability distribution is also known as the expected value; and is a measure
of central tendency of a probability distribution.
For a probability distribution, the mean represents the average value we are expected to get if
trials of the experiment is continued indefinitely.
𝐸(𝑥) = 𝜇 = ∑ 𝑥 ∙ 𝑃(𝑥)
Example 1
Determine the mean and variance for the following probability distribution.
Two coins are tossed, one after another. The following probability distribution table shows
the probability of 𝒙, where 𝒙 is a random variable representing the number of tails.
Page | 68
𝒙 𝑷(𝒙)
0 0.25
1 0.50
2 0.25
𝐸(𝑥) = ∑ 𝑥 ∙ 𝑃(𝑥)
𝐸(𝑥) = 1
The mean of the probability distribution is 𝑥 = 1. This means that the expected value of
the distribution is 1. If two coins are tossed indefinitely, the average outcome is
obtaining 1 tail.
𝜎 2 = 0.5
Page | 69
Example 2
A box has five tickets, each numbered 1, 2, 3, 4, or 5. Two cards are drawn from the box one
after another. After drawing the first card, it is returned to the box before drawing the
second card.
𝒙 𝑷(𝒙)
0 2 2 4
× =
5 5 25
1 2 3 3 2 12
( × )+( × )=
5 5 5 5 25
2 3 3 9
× =
5 5 25
𝐸(𝑥) = ∑ 𝑥 ∙ 𝑃(𝑥)
4 12 9
𝐸(𝑥) = (0) ( ) + (1) ( ) + (2) ( )
25 25 25
12 18
𝐸(𝑥) = 0 + +
25 25
30
𝐸(𝑥) =
25
𝐸(𝑥) = 1.2
Page | 70
The mean of the probability distribution is 𝑥 = 1.2. This means that the expected value
of the distribution is 1.2. If two cards are drawn one after another indefinitely, the
average amount of odd numbers in the outcome is 1.2.
𝜎 2 = 0.48
Example 3
In a Pick 3 Lottery, you can bet $10 by selecting three digits, each between 0 and 9, inclusive.
The numbers do not repeat themselves in the Lottery. If the same three numbers you picked
are drawn in the same order, you win and collect $300.
Page | 71
Let 𝑥 represent the amount of winnings
𝑥 = {−10, 290}
𝒙 𝑷(𝒙)
290 1 1 1 1
× × =
10 9 8 720
-10 1 1 1 719
1−( × × )=
10 9 8 720
1 719
𝐸(𝑥) = (290) ( ) + (−10) ( )
720 720
6900
𝐸(𝑥) = −
518400
𝐸(𝑥) = 0.013310185 …
The expected winnings for playing the Pick 3 Lottery is $0.01 on average.
Practice
𝒙 𝑷(𝒙)
1 0.52
2 0.06
3 0.42
Page | 72
b)
𝒙 𝑷(𝒙)
0 1
2
1 1
3
2 1
12
3 1
12
3) A box has five tickets, each labelled, A, B, C, D, or E. Two are to be randomly selected
without replacement. Let 𝑥 represent the number of occurrences of either a ticket
labelled A or a ticket labelled B. Determine the expected value of occurrences of A or B.
4) In a card game at a casino, the player is to select a card from a deck of 52 cards. Suppose
the casino will pay $25 if the player selects a King. If the player does not select a King,
the player loses $2 to the game.
a) What is the expected value of the casino’s winning?
b) What is the expected value of the player’s winning?
Page | 73
4.4 – Binomial Probability Distribution
A unique and important type of probability distribution is the binomial probability distribution.
It allows calculations of probabilities from experiments with only two possible outcomes or two
categories of outcomes.
The probabilities in such an experiment can be calculated by using the Binomial Probability
Formula:
𝑛!
𝑃(𝑥) = 𝑝 𝑥 𝑞 𝑛−𝑥
(𝑛
𝑥! − 𝑥)!
where
Page | 74
𝑝 - The probability of success on an individual trial.
𝑃(𝑥) − Binomial probability - the probability that an 𝑛-trial binomial experiment results in
exactly 𝑥 successes, when the probability of success on an individual trial is 𝑝.
𝑛!
is the number of combinations of 𝑥 items selected from 𝑛 different items
(𝑛−𝑥)! 𝑥!
Example 1
If a coin is flipped 10 times, what is the probability of having exactly 4 heads on the outcome?
Recall that flipping a coin is a binomial experiment because it meets the four criteria.
Each flip is independent of one another. There are only two possible outcomes, heads
and tails.
Because the question is asking for the probability of 4 heads, the flipping heads will be
considered as a “success”.
𝑛!
𝑃(𝑥) = 𝑝 𝑥 𝑞 𝑛−𝑥
𝑥! (𝑛 − 𝑥)!
10!
𝑃(4) = (0.5)4 (0.5)10−4
4! (10 − 4)!
10!
The calculation of 4!(10−4)! will allow us to determine how many ways there are of
having 4 heads out of 10 flips. Because the order of having 4 heads does not matter,
10!
there are 4!(10−4)! many ways to arrange 4 heads out of 10 flips.
Page | 75
The calculation of (0.5)4 allows us to determine the probability of flipping 4 heads.
𝑃(4) = 210(0.0625)(0.015625)
𝑃(4) = 0.205078125 …
𝑃(4) =̇ 0.205
Example 2
In an experiment, a coin is flipped 4 times. Create a probability distribution table for this
experiment by considering the number of heads in the outcome.
Let 𝑥 be the random variable representing the number of heads in the outcome
𝒙 𝑷(𝒙)
0 𝑃(0)
1 𝑃(1)
2 𝑃(2)
3 𝑃(3)
4 𝑃(4)
4!
𝑃(0) = (0.5)0 (0.5)4−0 = 0.0625
0! (4 − 0)!
4!
𝑃(1) = (0.5)1 (0.5)4−1 = 0.25
1! (4 − 1)!
4!
𝑃(2) = (0.5)2 (0.5)4−2 = 0.375
2! (4 − 2)!
Page | 76
4!
𝑃(3) = (0.5)3 (0.5)4−3 = 0.25
3! (4 − 3)!
4!
𝑃(4) = (0.5)4 (0.5)4−4 = 0.0625
4! (4 − 4)!
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0 1 2 3 4
Example 3
In 2007, 75% of the population in the Island of Micronesia was reported to be infected with
the Zika virus.
a) If 15 individuals are selected with replacement from the population, what is the
probability of selecting exactly 3 people with the Zika virus?
𝑛 = 15
𝑥=3
𝑝 = 0.75
𝑞 = 0.25
𝑛!
𝑃(𝑥) = 𝑝 𝑥 𝑞 𝑛−𝑥
𝑥! (𝑛 − 𝑥)!
Page | 77
15!
𝑃(3) = (0.75)3 (0.25)15−3
3! (15 − 3)!
𝑃(3) = 455(0.421875)(0.000000059)
𝑃(3) =̇ 0.000011441
The probability of selecting 3 people with the Zika virus is about 0.00114%.
b) If 15 individuals are selected with replacement from the population, what is the
probability of selecting exactly 5 people without the Zika virus?
𝑛 = 15
𝑥=5
𝑝 = 0.25
𝑞 = 0.75
𝑛!
𝑃(𝑥) = 𝑝 𝑥 𝑞 𝑛−𝑥
𝑥! (𝑛 − 𝑥)!
15!
𝑃(5) = (0.25)5 (0.75)15−5
5! (15 − 5)!
𝑃(5) = 3003(0.000976562)(0.056313514)
𝑃(5) =̇ 0.165
The probability of selecting 5 people without the Zika virus is about 16.5%.
c) If 15 individuals are selected with replacement from the population, what is the
probability of selecting 13 or more individuals with the Zika virus?
Selecting 13 or more individuals mean that 13 people have the virus, or 14 people
have the virus, or all 15 people have that virus.
Page | 78
15!
𝑃(13) = (0.75)13 (0.25)15−13 = 0.155907045
13! (15 − 13)!
15!
𝑃(14) = (0.75)14 (0.25)15−14 = 0.066817305
14! (15 − 14)!
15!
𝑃(15) = (0.75)15 (0.25)15−15 = 0.013363461
15! (15 − 15)!
The probability of selecting 3 people with the Zika virus is about 23.6%.
d) If 5 individuals are selected with replacement from the population, what is the
probability that at least one is infected with the Zika virus?
Selecting at least one individual with the Zika virus means one or more individuals
have the virus.
This means that the probability of at least one individual can be calculated by
𝑛!
𝑃(𝑥) = 𝑝 𝑥 𝑞 𝑛−𝑥
𝑥! (𝑛 − 𝑥)!
5!
𝑃(0) = (0.75)0 (0.25)5−0
0! (5 − 0)!
𝑃(0) = 0.000976562
= 0.999023437
Page | 79
Thus, the probability of having at least one person with the virus from 5 selected individuals is
99.9%.
Practice
1) In a consumer report, it was indicated that 79.5% of the flights in North America arrive
on time.
b) Among the upcoming 12 flights, what is the probability that exactly 2 of them will
not arrive on time?
c) Among the upcoming 12 flights, what is the probability that exactly half of them will
arrive on time?
d) Among the upcoming 5 flights, what is the probability that none of them will arrive
on time?
2) According to the Red Cross Blood Organization, the blood type most requested by
hospitals is type O. About 45% of the North American population has blood type O.
b) If 7 blood donors are selected at random, what is the probability that exactly 3 of
them will have blood type O.
c) If 10 blood donors are selected at random, what is the probability that exactly 6 will
have other blood types?
d) If 15 blood donors are selected at random, what is the probability that at least one
will have blood O?
Page | 80
3) In a clinical test of a new medication, 65.3% of the subjects treated with 5mg of the
medication experienced nausea as a side effect.
a) If 8 subjects are randomly selected and treated with 5 mg of the medication, what is
the probability that at least 6 subjects will experience nausea?
b) If 10 subjects are randomly selected and treated with 5 mg of the medication, what
is the probability that 4 or 5 subjects will experience nausea?
c) If 6 subjects are randomly selected and treated with 5 mg of the medication, what is
the probability that everyone will not experience the side effect?
a) If 9 consumers are randomly selected, what is the probability that all consumers
recognize the brand name?
b) If 7 consumers are randomly selected, what is the probability that at least 6 of them
recognize the brand name?
c) If 20 consumers are random selected, what is the probability that 50% of them do
not recognize the brand name?
Page | 81
4.5 – The Mean and Variance Binomial Probability Distribution
Recall that the central tendency and dispersion of a probability distribution can be described by
calculating the mean and variance a probability distribution.
The following table shows a comparison between mean and variance for any probability
distribution and the same measures for the binomial probability distribution.
Example
Determine the mean and variance of the probability distribution of this experiment:
A die is rolled 3 time. A random variable is used to describe the number of fours rolled
in the outcomes.
𝑥 = {0, 1, 2, 3}
𝒙 𝑷(𝒙)
0 𝑃(0)
1 𝑃(1)
2 𝑃(2)
3 𝑃(3)
Page | 82
3! 1 0 5 3−0
𝑃(0) = ( ) ( ) =̇ 0.578703703
0! (3 − 0)! 6 6
3! 1 1 5 3−1
𝑃(1) = ( ) ( ) =̇ 0.347222222
1! (3 − 1)! 6 6
3! 1 2 5 3−2
𝑃(2) = ( ) ( ) =̇ 0.069444444
2! (3 − 2)! 6 6
3! 1 3 5 3−3
𝑃(3) = ( ) ( ) =̇ 0.00462962963
3! (3 − 3)! 6 6
Using the probability distribution table, the mean can be calculated by:
𝜇 = ∑[𝑥 ∙ 𝑃(𝑥)]
But since this probability distribution is a binomial one, the quicker way to determine
the mean is:
𝜇 =𝑛∙𝑝
1
𝜇 = (3) ( )
6
𝜇 = 0.5
Both formulas will give the same result. The mean is 0.5.
Calculating variance:
𝜎2 = 𝑛 ∙ 𝑝 ∙ 𝑞
1 5
𝜎 2 = (3) ( ) ( )
6 6
𝜎 2 = 0.416̅
Page | 83
Practice
2) From the above question, what is the variance of the probability distribution?
3) A meteorologist estimates that over the next 15 days, the probability of rain during the
day is 46%, what is the mean of the probability distribution?
4) From the above question, what is the variance of the probability distribution?
Page | 84
Unit 5: Continuous Probability Distributions
- Uniform distribution
- Exponential distribution
- Normal distribution
1. For a continuous probability distribution, 𝑓(𝑥) ≥ 0 for all values x of the random
variable.
2. The total area under the graph of the probability distribution (under the graph of
𝑓(𝑥) = 𝑦) is always 1.
3. The probability that an observed value of 𝑥 falls between 𝑎 and 𝑏, 𝑃(𝑎 < 𝑥 < 𝑏), is the
area between 𝑎 and 𝑏 under the graph. For the regions of interest (for which we want
to calculate the probability), the probability that 𝑥 falls into the region is equal to the
area directly above the region and under the graph.
4. The probability of an exact value, 𝑃(𝑥 = 𝑎), is zero.
Assume that the time of birthday of a New Year’s baby at a hospital will be a random time
between midnight and 2 AM. Let 𝒙 be the number of minutes after midnight that the baby
will be born.
Page | 85
a) Graph the probability distribution.
Because the time of birth will be a random time, we can assume that the probability of
birth at any 𝑥 minute after minute is the same. Thus, the graph will show a uniform
probability distribution. The second probability states that the area under the graph is
1
always 1. Thus, the 𝑃(𝑥) = 120 = 0.008
b) Determine the probability of a baby being born between 12AM and 1:30 AM.
The probability will be the area under the graph between 𝑥 = 0 and 𝑥 = 90.
The probability can be found by determining area is shaded:
The probability, 𝑃(0) = 0, because for continuous variables, the probability of an exact
value is zero. This means that the probability of a baby being born at 12AM (not a tenth
of a second before or after midnight), is zero.
Page | 86
Example 2: Exponential Distribution
The waiting time at a health clinic has an exponential probability distribution modelled by the
formula:
𝟏 −𝟏𝒙
𝒇(𝒙) = 𝒆 𝟓
𝟓
Where 𝒙 is the amount of wait-time in minutes
The following graph represents the probability distribution, 𝒇(𝒙), where 𝒙 is between 0
minutes to 20 minutes. Determining the area of the exponential curve is beyond the scope of
this course, so, the areas are labelled.
a) Determine the probability that you have to wait more than 10 minutes
Page | 87
Practice
Page | 88
b) Determine the probability of an error occurring between the 1st and 5th minute
during the 3D-printing process.
c) Determine the probability of an error occurring after 20 minutes during the process.
d) Determine the probability of an error occurring in the first 3.5 minutes of the
process.
e) Determine the probability of an error occurring in the second half of the process.
Page | 89
5.2 – Normal and Standard Normal Distribution
A normal distribution is shaped as a bell-curve. It is symmetrical along the center, vertical line,
where the mean, 𝜇, is located. If a continuous random variable has a normal distribution, then
it can be described by the following equation
1 𝑥−𝜇 2
𝑒 −2 ( 𝜎 )
𝑦=
𝜎√2𝜋
The normal distribution is important because many naturally occurring measurements follow
such distribution if the frequencies of the measurements are recorded. For example, heights,
weights, and test scores tend to following a normal distribution.
Recall that mean, median, and mode are measures of central tendency. In a normal
distribution, the mean, median, and mode are the same value. Standard deviation is one of the
measures of variation in a data set. It measures on average, how many the data values are
deviating from the mean.
When the mean is 0 and the standard deviation is 1, the normal distribution is known as a
standard normal distribution.
The location of every data value can be determine by determining its rank with respect to the
mean using standard deviation as a unit of measurement. This is known as the z-score of a data
value.
Page | 90
The Z-Score
Consider a normal variable 𝑥 with the mean 𝜇 and standard deviation 𝜎. We define the
standard score (or z score) by the formula:
𝑥−𝜇
𝑧=
𝜎
The z-score of the data values that are below the mean will be negative.
The z-score of the data values that are above the mean will be positive.
Probability
From the previous chapter, we learned that the area between two values 𝑎 and 𝑏 under the
curve of the probability distribution represents the probability, 𝑃(𝑎 < 𝑥 < 𝑏)
The Empirical Rule states that there is approximately a fixed amount of area within each
standard deviation from the mean.
Thus, approximately 68.2% of the data is within 1 standard deviation away from the mean.
Approximately, 95.4% of the data is within 2 standard deviations away from the mean.
Approximately, 99.6% of the data is within 3 standard deviations away from the mean.
Translating to the concepts of probability, one can conclude that there is a 68.2% probability of
finding a data value that is within 1 standard deviation away from the mean.
Page | 91
Example 1
Studies show that the score of a mathematics entrance test for college students in Canada is
normally distributed, with a mean of 68.0% and a standard deviation of 4.3%.
90 − 68
𝑧=
4.3
22
𝑧=
4.3
𝑧 = 5.11627907
𝑧 =̇ 5.12
z-scores are usually rounded to 2 decimal places.
The test score, 90%, is about 5.12 standard deviations above the mean.
55 − 68
𝑧=
4.3
−13
𝑧=
4.3
𝑧 = −3.023255814
𝑧 =̇ − 3.02
A negative z-score means that the test score, 55%, is below the mean.
The test score, 55%, is about 3.02 standard deviations below the mean.
c) What test score is located 2.5 standard deviations above the mean?
Page | 92
𝑥 − 68
2.5 =
4.3
𝑥 = (2.5)(4.3) + 68
𝑥 = 78.75
The test score, 78.75% is located 2.5 standard deviation above the mean test score.
Using the Empirical Rule, notice that the areas below 1𝜎 is sum up to 84% of the total
area. That means the test score located at 1𝜎 is higher than 84% of the total.
𝑥 − 68
1=
4.3
𝑥 = (1)(4.3) + 68
𝑥 = 72.3
The test score that outperforms 84% of all the test scores is 72.3%
Page | 93
The Z-Score Table
From the above example, it is evident that knowing the z-score of a data value and the
Empirical Rule is very useful. Knowing the area under the curve can help us determine the
percentage of data is that below or above a specific data value. What if the area above or
below a specific data value is of interest, but that data value is located at, say,1.34 standard
deviations above the mean? Notice the Empirical Rule only provides areas between 1𝜎, 2𝜎, 3𝜎,
and −1𝜎, −2𝜎, −3𝜎. What if the data value of interest is not located at those exact points?
The Z-score Table is helpful in those situations. It provides the estimated area under the normal
distribution at various z-score values.
Page | 94
Page | 95
Page | 96
The z-score values run along the first column on the left of the table. Along the top row are the
values of the hundredth digit of a z-score. The body of the table provides the areas to the left of
the z-score of interest.
For example, if you wish to determine the area to the left of the z-score 1.43, begin by using the
page with positive z-scores on the first column. Find 1.4. Move along that row and stop at the
column with the heading 0.03. The value located in that cell is 0.9236. The area to the left of
the z-score 1.43 is 0.9236. This also means that 92.36% of the data is below the data value that
is 1.43 standard deviations above the mean.
Example 2
Determine the area under the normal distribution for the following z-scores.
a) 𝒛 < 𝟐. 𝟎𝟏
b) 𝒛 > −𝟏. 𝟐𝟐
Page | 97
c) −𝟐. 𝟎𝟐 < 𝒛 < 𝟏. 𝟕𝟒
Reading the larger area first, it is the area to the left of the z-score 1.74. The area is
0.9591
The smaller area is the area to the left of the z-score -2.02. The area is 0.0217.
Therefore, the area in between is the difference between the above areas.
Example 3
a) Determine the probability of having a chance of precipitation that is lower than 50%.
𝑧 = 2.13
Find 𝑃(𝑧 < 2.13) using the z-score table and reading the area to the left of the z-score
Page | 98
The probability of having a chance of precipitation that is lower than 50% is 98.34%
First, we use the z-score table to determine the z-score that has 80% of area above it.
Recall that the table provides areas to the left of a z-score. Thus, when we read the
table, we are looking for the z-score with 20% area below it.
𝑥𝑜 = (−0.84)(7.6) + 34
𝑥𝑜 = 27.616
Because 20% of the data is below the value 27.616, this data value is also known as the
20th percentile.
Page | 99
Practice
3) The height of babies at the age of 6 months follows a normal distribution. From a study
that collect heights from a sample of babies, it was found that the mean is 26.7 inches
and the standard deviation is 0.45 inches.
a) What is the probability of finding a baby at the age of 6 months with a height that is
less than 27 inches?
b) What is the probability of finding a baby at the age of 6 months with a height that is
greater than 26.9 inches?
c) What is the probability of finding a baby at the age of 6 months with a height that is
1 inch between the mean?
d) What percentage of babies can be found to be taller than 25.5 inches at the age of 6
months?
Page | 100
4) At a company, the amount of retirement savings employees invest per month was
recorded. The data follows a normal distribution with a mean of $360 and a standard
deviation of $22.30.
a) What is the probability of finding an employee who invests more than $400 per
month?
b) What is the probability of finding an employee who invests between $350 and $420
per month?
c) Below what amount of investment will we find 75% of the employees’ investment?
Page | 101
5.3 – Central Limit Theorem
The Central Limit Theorem (CLT) states that if random samples are taken from a population and
the mean in each random sample is calculated, the means tend to form a normal distribution.
0.15
0.1
0.05
0
1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6
Average Value of Two Dice
0.1
0.05
0
1
1.33
1.66
2
2.33
2.66
3
3.33
3.66
4
4.33
4.66
5
5.33
5.66
6
Page | 102
In the first case, 𝑛 = 1 because one die is rolled and only one value is recorded. Notice that the
probability distribution is not a normal distribution.
In the second event, 𝑛 = 2 since two values are recorded when we roll two dice.
Each time two dice are rolled, it is considered as a random sample. The mean is found in each
random sample. The probability of obtaining each mean is calculated. Notice that the
probability distribution is now approximating the normal distribution.
In the third event, 𝑛 = 3 since three values are rolled in each random sample. Multiple random
samples are made and the mean is found between the three values that are rolled. The
probability of obtaining each mean is calculated. Notice that the probability distribution is even
closer to the normal distribution.
The mean calculated from each random sample is known as the sample mean.
When all the sample means are used to create a frequency distribution, the distribution is
known as a sampling distribution. In the second and the third event, the probability
distributions are both sampling distributions since the sample means are recorded and
graphed.
If random samples of 𝑛 observations are taken from a population that is not normally
distributed with a mean of 𝜇 and a standard deviation of 𝜎, then as 𝑛 increases, the sampling
distribution of the sample mean 𝑥̅ is approximately normally distributed, with a mean of 𝜇 and
𝜎
a standard deviation of 𝑛.
Page | 103
In other words,
The mean of the sampling distribution is 𝜇𝑥̅ = 𝜇
𝜎
The standard deviation of the sampling distribution is 𝜎𝑥̅ =
√𝑛
If the population the random samples are drawn from is a normal distribution, the sampling
distribution of 𝑥̅ will also be normal.
If the population the random samples are drawn from is a uniform distribution, the sampling
distribution of 𝑥̅ will approximate the normal distribution as soon as 𝑛 = 3
If the population the random samples are drawn from is a skewed distribution, the sampling
distribution of 𝑥̅ will approximate the normal distribution with at least 𝑛 = 30.
The examples below will demonstrate how the Central Limit Theorem (CLT) is applied.
Example
The weight of adult males in the population is a normal distribution that has a mean of 182.9
lb and a standard deviation of 40.8 lb. When designing elevators, one important
consideration is the weight capacity. Most of the current elevators have a weight capacity of
16 passengers with a total weight of 5000 lb. This means an assumption that the mean weight
of one passenger is 312.5 lb.
What is the probability that the mean of 16 passengers will exceed 180 lb?
If the capacity is 16 passengers, then any 16 people on the elevator can be considered
to be a random sample of the population.
𝑛 = 16
This is a small population because it is below 30; however, the sample is taken from a
population that has a normal distribution, so the sampling distribution will also be a
normal distribution.
The sampling distribution will have the following mean and standard deviation:
Page | 104
𝜇𝑥̅ = 𝜇 = 182.9
𝜎 40.8
𝜎𝑥̅ = = = 10.2
√𝑛 √16
Since we are trying to find the probability that the mean of 16 passengers will exceed
180 lb, we need to determine the z-score and the area to the right of 180 lbs.
𝑥̅ = 180
180 − 182.9
𝑧=
10.2
𝑧 = −0.28
The probability that the mean of 16 passengers will exceed 180 lbs is 61.03%.
Practice
1) The population of quarters has a mean of 5.67 g, but the weights of individual quarters
vary. The distribution of its weight has a standard deviation of 0.1 g. Random samples of
32 quarters are taken.
b) What is the probability that a random sample will have a mean greater than 5.68g?
Page | 105
c) What is the probability that a random sample will have a mean less than 5.65g?
3) The mean wage of the employees of a company is $31,450 and the standard deviation is
$980.
a) What is the probability that the mean wage of 35 randomly selected workers will
exceed $32,000?
b) What is the probability that the mean wage of 50 randomly selected workers will be
less than $31,100?
c) If 100 workers a selected in a random sample, what is the wage that exceeds 80% of
the sample means?
4) The normal daily human potassium requirement is in the range of 2000 to 6000 mg. The
potassium content in a glass of orange juice is normally distributed with a mean of 470
mg and a standard deviation of 22 mg.
a) If a person drinks 3 glasses of orange juice in a day, what is the shape of the
sampling distribution? Explain.
b) If a person drinks 3 glasses of orange juice in a day, what is the probability that the
person will have an amount of potassium that is greater than 500 mg?
Page | 106
c) If the potassium levels of 10 glasses of orange juice was measured for nutritional
information collection, what is the probability that the amount of potassium will be
between 450 and 550 mg?
Page | 107
Unit 6: Statistical Inference
6.1 – Estimating Population Mean for Large Samples
Making a statistical inference means to estimate and form judgement about the parameters of
a population based on a sample.
For example, what is the average tuition that Ontario college students pay per year? The
average tuition of Ontario college students is the population mean being estimated. To make
this estimation, a survey could be conducted to take a sample of a group college students.
Another example would be, what is the proportion of all Ontario college students that relies on
taking the bus to school? Again, a survey could be conducted to take a sample of a group of
college students.
Before discussing the process of making an estimation of a population parameter, there are a
few definitions:
Page | 108
Confidence Intervals
The most common confidence intervals are 90%, 95%, and 99%.
A confidence level allows us to communicate the percentage of confidence we are with our
estimation.
Every confidence interval has a critical value, 𝑧𝛼/2 , associated with it. The critical value is a
𝑧-score with the property that separates an area of 𝛼/2 in each tail of the standard normal
distribution.
Using the standard normal distribution, we can determine the critical values for the
following common confidence levels.
For example, if the confidence level is 90%, then we are confident that our estimation is
within 90% around the true value.
Page | 109
Finding Margin of Error using Confidence Intervals
The difference between the sample statistic and the population parameter is known as
the Margin of Error.
When estimating a population mean, the margin of error is calculated using
𝑠
𝐸 = 𝑧𝛼/2
√𝑛
𝑠
𝜇 = 𝑥̅ ± 𝑧𝛼/2
√𝑛
Example 1
A sample of 40 values was taken and it was found that the sample mean is 𝟏𝟔 and the sample
standard deviation is 2.5. Estimate the population mean at a 95% confidence level.
𝑠
𝜇 = 𝑥̅ ± 𝑧𝛼/2
√𝑛
2.5
𝜇 = 16 ± 1.96
√40
𝜇 = 16 ± 0.774758026
We are 95% confident that the estimated population mean is between 15.23 and 16.77.
Page | 110
Example 2
𝑠
𝜇 = 𝑥̅ ± 𝑧𝛼/2
√𝑛
0.082
𝜇 = 0.458 ± 2.575
√35
𝜇 = 0.458 ± 0.035690864
We are 95% confident that the estimated population mean is between 0.42231 and
0.49369.
Practice
1) A sample of 100 values was taken and it was found that the sample mean is 56 and the
sample standard deviation is 11.5. Estimate the population mean at a 99% confidence
level.
2) The level of acidity in a liquid is measured using pH. A water sample taken from 58
rainfalls were analyzed to find a mean pH of 3.9 and a standard deviation of 0.6.
Estimate the pH of all rainfalls using a 90% confidence level.
3) A survey was conducted in a random sample of 500 people in Toronto and it was found
that the mean amount of time spent commuting to work was 2.4 hours per day and the
standard deviation was 0.5 hours. Using a 95% confidence level, estimate the mean
amount of time spent commuting to work for all Torontonians.
4) A machine is designed to produce rubber gaskets with a mean thickness of 0.125 inches
and a standard deviation of 0.01 inches. This is a normal distribution. Over time, usage
of the machine changes the mean of the thickness in the gaskets produced by the
machine. How many gaskets should be measured to be able to estimate the new mean
with a maximum error of the estimate as 0.001 inches and a 90% level of confidence?
Page | 111
6.2 – Estimating Population Mean for Small Samples
When a sample is small, the sampling distribution will approximate the normal distribution but
will not necessarily going to be an accurate bell-shaped curve. Because of this, z-scores are not
very appropriate as a measure in calculations.
Student’s t distribution
The Student’s t distribution is a distribution published by W. S. Gosset in 1908. He worked at
the Guinness brewery at the time and used statistics in his work in the brewery process. He
published his work under his pseudonym “Student”. When a small sample (𝑛 < 30) is taken
from a population that is normally distributed, the sampling distribution tends to following a 𝑡-
distribution.
Notice it is very similar to the Standard Normal distribution. There are different 𝑡-distributions
because for a different sample size, the shape of the curve is slightly different.
Page | 112
Properties of the 𝒕-distribution:
1) It has a bell-shaped curve that is symmetrical about 𝑡 = 0
2) When 𝑛 is infinitely large, the 𝑡 and the normal distributions are identical
3) The shape of the 𝑡-distribution is determined by the Degree of Freedom,
𝑑𝑓 = 𝑛 − 1
To determine the area under the curve at various 𝑡-values, the 𝑡-distribution table is used.
Page | 113
The values along the first column on the left are the Degrees of Freedom. The values along the
top row are the areas to the right of a specific 𝑡 value.
For example, for a sampling distribution when the sample size is 11, 𝑑𝑓 = 10. On the curve
with 𝑑𝑓 = 10, the 𝑡 value 0.700 has an area of 0.25 to its right.
Page | 114
Example 1
a) Determine the 𝒕 value such that the area under the curve to the right of the value is
0.15 for a Student’s 𝒕 distribution with 𝒅𝒇 = 𝟐𝟑
At the 23rd row and the column with the heading 0.15, the 𝑡 value reads 1.060.
b) Determine the 𝒕 value such that the area under the curve to the left of the value is
0.90 for a Student’s 𝒕 distribution with 𝒏 = 𝟏𝟐
If 0.90 is the area to the left of a 𝑡 value, then the area to the right of it is 0.10.
If 𝑛 = 12, then 𝑑𝑓 = 11
At the row where 𝑑𝑓 = 11 and the column with the heading 0.10, the 𝑡 value reads
1.363
c) Determine the 𝒕 value such that the area under the curve centered around the mean is
95% for a Student’s 𝒕 distribution with 𝒏 = 𝟏𝟓
If 95% is the area centered around the mean, then the remaining areas divided equally
into each of the tails is 2.5%.
In this case, 𝑑𝑓 = 14
At the row where 𝑑𝑓 = 14 and the column with the heading 0.025, the 𝑡 value reads
2.145.
Depending on the sample size, the method of estimating a population mean would either be
using 𝑧-scores or 𝑡-values:
Page | 115
Depending on the sample size, the critical values associated with the following common
confidence levels would either be 𝑧𝛼/2 or 𝑡𝛼/2 :
𝑑𝑓 = 𝑛 − 1
99% 0.01 2.575
Example 2
The mean amount of sodium in a sample of 20 bottles of ketchup is 142 mg with a standard
deviation is 2.3 mg. The sample distribution is a normal distribution. Construct a 99%
confidence interval estimate of the mean amount of sodium in ketchup.
The sample size is 20, which means that the 𝑡-distribution is more appropriate for the
estimation of the population mean.
The critical value associated with a 99% confidence level for 𝑑𝑓 = 19 is 𝑡0.01 = 2.861
2
To calculate the Margin of Error to determine the upper and lower confidence limits, we
use 𝑡𝛼/2 , which is 𝑡0.01 = 2.861 in this case.
2
𝑠
𝜇 = 𝑥̅ ± 𝑡𝛼/2
√𝑛
Page | 116
2.3
𝜇 = 142 ± 2.861
√20
𝜇 = 142 ± 1.471399811
We are 99% confident that the estimated mean amount of sodium in all ketchup is
between 140.53 mg and 143.47 mg.
Practice
1) The mean number of hours of tutoring services used per week by college students was
found to be 3.8 in a sample of 24 students. The standard deviation from the sample is
0.31 hours. Construct a 99% confidence interval estimate of the mean number of hours
per week used by all college students.
2) In a test of weight loss programs, 14 adults were sampled and found that after 6
months, their mean weight loss was found to be 22.4 lb with a standard deviation of 6.1
lb. Construct a 95% confidence interval estimate of the mean weight loss for all adults.
4) The amount of time it takes a customer to complete an order during online shopping
was measured. When sampling 18 customers, it was found that the mean amount of
time was 13.4 minutes and the standard deviation is 2.5 minutes.
a) Using a 95% confidence level, estimate the amount of time it takes a customer to
complete an order during online shopping in general.
b) Using a 90% confidence level, estimate the amount of time it takes a customer to
complete an order during online shopping in general.
Page | 117
c) Compare the confidence intervals in the above questions. Identify the interval that is
wider. Explain why that interval is wider.
d) If the company decided that they would need to improve their website when there
is a chance it will take customers longer than 15 minutes to complete an order
during online shopping, based on what was calculated, would the company need to
make improvements?
Page | 118
6.3 – Estimating Population Proportion for Large and Small Samples
Another population parameter that can be estimated is population proportion. Making this
estimation could be useful in situations where the following questions arise:
- What proportion of Ontario voters will vote for the Liberal Party?
- What proportions of grapes in the vineyard will survive through the winter?
- What proportion of college students will graduate this semester?
1) Determine the critical value associated with the confidence level, 𝑧𝛼/2 or 𝑡𝛼/2
2) Calculate the Margin of Error, 𝐸
𝑝̂𝑞̂ 𝑝̂𝑞̂
𝐸 = 𝑧𝛼/2 √ or 𝐸 = 𝑡𝛼/2 √
𝑛 𝑛
where 𝑞̂ = 1 − 𝑝̂
3) Using the value of 𝐸, calculate the lower confidence limit 𝑝̂ − 𝐸 and the upper
confidence limit 𝑝̂ + 𝐸, which will define the confidence interval
𝑝̂𝑞̂ 𝑝̂𝑞̂
𝑝 = 𝑝̂ ± 𝑧𝛼/2 √ or 𝑝 = 𝑝̂ ± 𝑡𝛼/2 √
𝑛 𝑛
Page | 119
Example 1
The automated phone survey of 1,049 Toronto residents was done Wednesday night…
A new poll from Forum Research puts the mayor-in-name-only holding steady at 42
per cent, which is down just slightly from two weeks ago after Ford admitted to
smoking crack cocaine.
In the above example, the sample size is 1049. The sample proportion is 42% or 0.42.
In other words, 𝑝̂ = 0.42 and 𝑞̂ = 0.58
The sample size is greater than 30, thus, the critical value to be used is 𝑧𝛼 = 𝑧0.10 for a
2 2
90% confidence level.
If the City of Toronto would like to estimate the population proportion to determine the
proportion of all Toronto residents who are supporting Ford at a confidence level of
90%, then the Margin of Error is
𝑝̂ 𝑞̂
𝐸 = 𝑧𝛼 √
2 𝑛
(0.42)(0.58)
𝐸 = 1.645 √
1049
𝐸 = 0.025067833
This means that the interval is defined by the following lower and upper confidence
limits:
𝑝̂ − 𝐸 = 0.42 − 0.025067833 =̇ 0.395
𝑝̂ + 𝐸 = 0.42 + 0.025067833 =̇ 0.445
In conclusion, the city can state that they are 90% confidence that the proportion of all
Toronto residents support Ford is between 39.5% and 44.5%.
Page | 120
Example 2
For quality control in the meat department, 25 samples of beef packages were taken and it
was found that 3 were contaminated. Using a 95% confidence level, determine the
proportion of all beef packages that might be contaminated.
The sample size is 25. So, 𝑡0.05 will be used for a 95% confidence level.
2
3
The sample proportion, 𝑝̂ = = 0.12
25
So, 𝑞̂ = 0.88
𝑝̂ 𝑞̂
𝐸 = 𝑡𝛼 √
2 𝑛
(0.12)(0.88)
𝐸 = 2.064 √
25
𝐸 = 0.134144122
This means that the interval is defined by the following lower and upper confidence
limits:
𝑝̂ − 𝐸 = 0.12 − 0.134144122 =̇ − 0.014
𝑝̂ + 𝐸 = 0.12 + 0.134144122 =̇ 0.254
It is not possible to have −1.4% of the beef packages that are contaminated. Thus, the
lower confidence limit will be accepted as 0%
In conclusion, we are 95% confident that the proportion of all beef packages that are
contaminated are between 0% and 25.4%.
Page | 121
Practice
1) In a recent study conducted in May, 2016, it was reported that 85% of the 900 call
centre workers sampled are working from home. Using a 90% confidence level, estimate
the proportion of all call centre workers in Canada that are working from home.
2) From a sample of 20 youths, 14 stated that they will being voting in the Federal
Elections. Using a 95% confidence level, determine the confidence interval for the
proportion of all youths who will vote in the Federal Elections.
3) In a poll, respondents were asked whether they felt vulnerable to identity theft. Of the
981 surveyed, 505 stated “yes”. Using a 99% confidence level, determine the confidence
interval for the population proportion that feels vulnerable to identity theft.
4) In a small city, the number of car accidents recorded in the last 6 months was 28. Of
those accidents, 12 of them resulted in injuries. Construct a 90% confidence interval to
estimate the proportion of all accidents that do not result in injuries.
Page | 122
6.4 – Hypothesis Testing of Population Mean of a Large Sample
Hypothesis testing refers to the process of using statistics to decide whether to accept or reject
hypotheses that are made. A hypothesis is a claim or statement about a property pertaining to
a population.
The following are examples of situations in which hypothesis testing would be useful:
1) A pilot project is conducted to determine which software program is more suitable for
students. The hypotheses might be:
o Software A is better
o Software B is better
2) Which pet food brand is more suitable for senior dogs?
The hypotheses might be:
o Pet food brand A is better
o Pet food brand B is better
3) Is a medication effective in lowering cholesterol for women?
The hypotheses might be:
o The medication is effective because cholesterol is found to be lowered
o The medication is not effective because cholesterol is not found to be lowered
Types of Hypotheses
1) The null hypothesis is a statement that the value of a population parameter is equal to
some claimed value. It is denoted by the symbol 𝐻0
2) The alternative hypothesis is a statement that the parameter has a value that differs
from the null hypothesis. It is denoted by the symbol 𝐻𝐴
1) Hypotheses
Page | 123
b) Identify the Null Hypothesis and the Alternative Hypothesis
c) Express the statements using symbolic form
3) Sampling Distribution
4) Test Statistic
The test statistic is a value used in making a decision about the null hypothesis.
Calculate the test statistic using the relevant formula:
Normal (𝑧) 𝑥̅ − 𝜇
𝑧= 𝑠
√𝑛
𝑡-distribution 𝑥̅ − 𝜇
𝑡= 𝑠
√𝑛
5) Critical Regions
Depending on the hypothesis testing being performed, there are three types of critical
regions
- Two-tailed test
- Left-tailed test
- Right-tailed test
Page | 124
a) Identify the correct type of test.
b) Identify the rejection and acceptance regions using 𝛼
c) Draw the graph and identify the critical value that separates the regions
6) Conclusion
1) Type I error: the mistake of rejecting the null hypothesis when it is actually true
2) Type II error: The mistake of failing to reject the null hypothesis when it is actually false
Page | 125
Example 1
A food manufacturing company produces fruit jams with 11 g of sugar in each jar. The Health
& Nutrition Department took a sample of 40 jars of jam to ensure that the amount of sugar in
the jam matches the amount stated on the nutritional label. The sample mean amount of
sugar was recorded to be 11.4 g with a standard deviation of 0.6 g. Is there enough evidence
to conclude that there is a mismatch between the nutritional information and the product?
Perform the hypothesis test at a 10% level of significance.
Notice that the null hypothesis is written to state that the mean amount of sugar of all
jams is equal to the claimed value, 11 g
The alternative hypothesis is a statement that differs from the null hypothesis. If there is
a mismatch between the nutritional information and the product, then the mean would
deviate from 11 g significantly.
3) Sampling distribution
The sample size is 40 jars of jam, which is considered as a large sample. Therefore, the
sampling distribution will be a normal distribution.
Using the following table, we could record information about the population and the
sampling distribution:
Population Sample
𝜇 = 11 𝑛 = 40
𝑥̅ = 11.4
𝑠 = 0.6
Page | 126
4) Test Statistic
Since the sampling distribution is a normal distribution, the test statistic will be
calculated with
𝑥̅ − 𝜇
𝑧= 𝑠
√𝑛
11.4 − 11
𝑧=
0.6
√40
0.4
𝑧=
0.094868329
𝑧 =̇ 4.22
5) Critical regions
The shaded region is known as the acceptance region and the non-shaded region is
known as the rejection region.
Page | 127
The rejection region is 5% each, totaling to the value of 𝛼 = 0.10
To locate whether the test statistic is in the acceptance or rejection region, determine
the critical value that separates the regions.
The z-score -1.645 separates the lower 5% with the remaining area, and the z-score
1.645 separates the upper 5% with the remaining area.
From Step 4, the test statistic is 4.22, which is a z-score that is located in the rejection
region.
6) Conclusion
Since the test statistic is in the rejection region, the null hypothesis is rejected.
We can conclude that there is a mismatch between the nutritional information and the
product because it was found that the mean amount of sugar is significantly different
from 11 g, which is the value on the nutritional label.
Example 2
A research study was performed in attempt to answer the above question. The mean math
score of male students is 76.0%. From a random sample of 34 female students, it was found
that the mean math score was 78.4% with a standard deviation of 14.2%. Perform a
hypothesis testing with a 10% level of significance.
1) Hypotheses
𝐻0 : Female students do not perform better than male students in math, 𝜇 = 76.0
𝐻𝐴 : Female students perform better than male students in math, 𝜇 > 76.0
3) Sampling distribution
Population Sample
𝜇 = 76.0 𝑛 = 34
𝑥̅ = 78.4
𝑠 = 14.2
Page | 128
4) Test Statistic
𝑥̅ − 𝜇
𝑧= 𝑠
√𝑛
78.4 − 76.0
𝑧=
14.2
√34
2.4
𝑧=
2.435279909
𝑧 =̇ 0.99
5) Critical Regions
Because the alternative hypothesis is that 𝜇 > 76.0, we are testing whether 𝑥̅ is
significantly greater than the value of 76.0. This means that we are testing whether 𝑥̅
deviates so far above the value of 76.0 that there is a very low probability for that
deviation to occur.
This results in an upper-tail test, where the upper tail will be the critical region.
In Step 2, 𝛼 = 0.10
The shaded region represents the rejection region. The critical value is 𝑧 = 1.28
Page | 129
From Step 4, the test statistic is 0.99, which is a z-score that is located below the critical
value. The test statistic is located in the acceptance region.
6) Conclusion
Since the test statistic is in the acceptance region, it is concluded that we fail to reject
the null hypothesis.
We can conclude that there is not enough evidence to state that female students
perform significantly better than male students in math.
Practice
1) The mean daily yield of a chemical produced by a pharmaceutical company is 880 metric
tons. The quality control department would like to determine whether this average has
changed recently. A random sample of 50 daily yields were taken and the sample mean
was 872 metric tons. The standard deviation was 23 metric tons. Perform a hypothesis
testing using a 5% level of significance.
2) It is recommended that the daily sodium intake should not exceed 2300 mg. To
determine whether Canadians are exceeding this limit, a random sample of 300
Canadians were surveyed. It was found that the sample mean was 2550 mg and sample
standard deviation was 923 mg. Using a 10% level of significance, test the hypothesis to
determine whether Canadians are exceeding the daily sodium intake limit.
Page | 130
6.5 – Hypothesis Testing of Population Mean of a Small Sample
Recall that when a small sample (𝑛 < 30) is taken from a population, the sampling distribution
tends to following a 𝑡-distribution.
When performing a hypothesis testing in a situation where a small sample is collected, the
method of hypothesis testing consists of the same 6 steps:
1) Hypotheses
2) Identify the Significance Level
3) Sampling Distribution
4) Test Statistic
5) Critical Regions
6) Conclusion
When calculating the test statistic, the relevant formula for a small sample would be the 𝑡
distribution formula:
Normal (𝑧) 𝑥̅ − 𝜇
𝑧= 𝑠
√𝑛
𝑡-distribution 𝑥̅ − 𝜇
𝑡= 𝑠
√𝑛
To determine the critical values that define the critical regions, 𝑡 values would be used.
Page | 131
Example 1
An ice cream company claimed that its product contained on average 500 calories per pint. To
test this claim, 24 one-pint containers were analyzed, giving a mean of 507 calories and a
standard deviation of 21 calories. Test the claim at a 2% level of significance.
1) Hypotheses
𝐻0 : The claim is true, there is on average 500 calories per pint of ice cream 𝜇 = 500
𝐻𝐴 : The claim is not true, the average calories per pint of ice cream is not 500 calories,
𝜇 ≠ 500
3) Sampling distribution
The sample size is 24, which is a small sample. Therefore, the sampling distribution will
be a 𝑡 distribution
Population Sample
𝜇 = 500 𝑛 = 24
𝑥̅ = 507
𝑠 = 21
4) Test statistic
Since the sampling distribution is a 𝑡 distribution with 𝑑𝑓 = 23, the test statistic will be
calculated with
𝑥̅ − 𝜇
𝑡= 𝑠
√𝑛
507 − 500
𝑡=
21
√24
7
𝑡=
4.28660705
𝑡 =̇ 1.633
5) Critical regions
This is a two-tailed test. Since 𝛼 = 0.02, the area in each tail is 0.01 or 1%
Page | 132
The test statistic, 𝑡 =̇ 1.633, is located in the accepted region.
6) Conclusion
Since the test statistic is in the acceptance region, we fail to reject the null hypothesis.
We can conclude that there is not enough evidence to say that the claim that the ice
cream contained on average 500 calories per pint is not true.
Example 2
A clinical trial was conducted to test the effectiveness of an antibiotic. Before treatment with the drug,
16 subjects had a mean duration of bacterial infection of 5.6 days. After treatment with the drug, the 16
subjects had a mean duration of bacterial infection of 4.0 days and a standard deviation of 1.6 days.
Perform a hypothesis testing to test the claim that the antibiotic is effective in lowering the duration of
bacterial infection. Test the claim at 10% level of significance.
1) Hypotheses
𝐻0 : The antibiotic is not effective in lowering the mean duration of infection, 𝜇 = 5.6
𝐻𝐴 : The antibiotic is effective in lowering the mean duration of infection, 𝜇 < 5.6
Population Sample
𝜇 = 5.6 𝑛 = 16
𝑥̅ = 4.0
𝑠 = 1.6
Page | 133
4) Test statistic
𝑥̅ − 𝜇
𝑡= 𝑠
√𝑛
4.0 − 5.6
𝑡=
1.6
√16
−1.6
𝑡=
0.4
𝑡 =̇− 4.000
5) Critical Regions
This is a lower-tailed test, with 𝛼 = 0.10, or 10% of the area in the lower tail.
Since the test statistic is in the rejection region, we reject the null hypothesis.
We conclude that there is evidence that the antibiotic is effective because it significantly
lowered the duration of bacterial infection.
Page | 134
Practice
1) A simple random sample of the weights of 19 green M&Ms has a mean of 0.8635 g and
a standard deviation of 0.0570 g.
Use a 5% significance level to test the claim that the mean weight of all green M&Ms is
equal to 0.8535 g, which is the mean weight required so that M&Ms have the weight
printed on the package level.
Do green M&Ms appear to have weights consistent with the package label?
2) The director of admissions at a university claimed that families with an income of $30,
000 a year contributed an average of $6000 per family toward a child’s education. A
sample of 20 such families whose children attended the university revealed a mean
contribution of $6200 and a standard deviation of $300. Test the claim with a 5% level
of significance.
3) A plastic has a mean breaking strength of 27 pounds per square inch and a standard
deviation of 6 pounds per square inch. A new process is developed and will replace the
old one, provided that there is substantial evidence that it improves the strength of the
product. A random sample of 15 pieces made with the new process gives a sample
mean of 30 pounds per square inch and a standard deviation of 5.3 pounds per square
inch. Is there sufficient evidence to suggest that the strength of the product has
increased at a 1% level of significance?
5) If in the above question, the null hypothesis was rejected, what type of error was made
in making the conclusion?
Page | 135
6.6 – Hypothesis Testing of Population Proportion of a Large Sample
When testing a claim that involves a population proportion, the same steps of hypothesis
testing are used. However, the test statistic is calculated using a specific formula that is relevant
to population proportion, 𝑝
Recall that the sample proportion is represented by 𝑝̂ , and that 𝑞̂ = 1 − 𝑝̂
In the case of a large sample, the sampling distribution for the parameter, 𝑝, is the normal
distribution.
The test statistic is calculated by:
𝑝̂ − 𝑝
𝑧=
𝑝𝑞
√
𝑛
Example 1
In the 1970s, 15 out of 405 (3.7%) teachers hired in St. Louis County, Missouri, were African
American. In St. Louis County and a nearby city, 15.4% of teachers were African American.
The Equal Employment Opportunity Commission (EEOC) filed a lawsuit against the school
boards for discrimination against African Americans and won the case. The case depended
largely on a test of hypotheses.
If 15.4% is the proportion of African American teachers hired the in general population, at a
2.5% level of significance, perform a hypothesis testing to determine whether it is a case of
discrimination in St. Louis County.
1) Hypotheses
𝐻0 : There is no discrimination against African American teachers in the hiring process,
𝑝̂ = 0.154
Page | 136
3) Sampling distribution
Population Sample
𝑝 = 0.154 𝑛 = 405
𝑞 = 0.846 𝑝̂ = 0.037
𝑞̂ = 0.963
4) Test statistic
𝑝̂ − 𝑝
𝑧=
𝑝𝑞
√
𝑛
0.037 − 0.154
𝑧=
√(0.154)(0.846)
405
−0.117
𝑧=
0.017935687
𝑧 = −6.52
5) Critical regions
Since this is a lower-tailed test, the area in the lower tail is 2.5%. The critical value
separating the acceptance and rejection regions is 𝑧 = 1.96
Page | 137
6) Conclusion
Since the test statistic is in the rejection region, it is concluded that we reject the null
hypothesis.
We can conclude that the proportion of African American teachers is significantly lower.
There is evidence to state that there is discrimination against African American teachers
in the hiring process.
Practice
1) A survey was conducted to determine the proportion of population that meets the
minimum daily fiber intake requirement, which is 25 g for women and 38 g for men.
Among the 514 people surveyed, 90% claimed that they do not meet the minimum daily
fiber intake requirement. Use a 5% significance level to test the claim that 85% of the
population does consume enough fiber per day.
2) In a previous educational research study, it was found that 14% of college students drop
out within the first year of their college studies. It was suspected that the dropout rate
has increased over the years. In a sample of 32 colleges, it was found that the dropout
rate is 17.3%. Use a 10% significance level to test the claim that the dropout rate has
increased significantly.
3) One organization claimed that about 20% of Canadian adults have a physically active
lifestyle. A random sample of 100 Canadians were surveyed. It was found that 79 adults
stated that they do not have a physically active lifestyle. Using a 10% level of
significance, perform a hypothesis testing to test the claim that 20% of Canadian adults
have a physically active lifestyle.
4) Of all the machine parts produced, a machine that manufactures these parts produces
4.5% of defectives. A technician made an adjustment and believed that it would help
decrease the proportion of defectives produced. A sample of 60 parts were taken to test
whether the proportion of defectives are decreased. It was found that 2 of the 60 parts
were defective. Use a 2% level of significance to test whether there is a decrease in the
proportion of defectives produced.
Page | 138
Lab Activities
Unit 1
Activity: Graphical Representation of Data
The following data set shows the amount of consumption measured in Quadrillion BTUs
(British Thermal Unit) in the U.S. in 2009 in the following areas:
Electricity – 4.388
Natural Gas – 4.694
Propane – 0.492
Fuel Oil – 0.584
a) Determine the most appropriate graph to visually represent this data set.
b) Using Microsoft Excel, graph the data.
c) Can this data set be represented using a frequency distribution? Explain.
d) Using Microsoft Excel, create a stem-and-leaf plot of the data set.
Unit 2
Activity: Descriptive Statistics
The following set of data are numbers of manatee deaths caused each year by collision with
watercrafts. Manatees are large mammals that live underwater and near waterways.
78 81 95 73 69 79 92 73 90 97
Page | 139
Unit 3
Activity 1: Linear Regression
Years of Salary in
experience Thousands Salary of Employees
12 39 60
16 41 55
Salary in thousands
6 33 50
23 44
45
27 48
8 34 40
5 32 35
19 44 30
23 46 0 0.5 1
13 37 Years of Experience
16 43
8 37
a) Using Excel, create a table to calculate the equation of the linear regression line.
b) Graph the data using a scatterplot.
c) Using Excel’s feature to create a trend line on the scatterplot.
d) Verify that your calculated trend line is the same as the one provided by Excel.
𝑥 𝑦 𝑥𝑦 𝑥2
Sum
e) Using Excel, determine the linear correlation coefficient of the above set of data using
this formula
∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅)
𝑟=
(𝑛 − 1)𝑠𝑥 𝑠𝑦
Page | 140
Unit 3
Activity 2: Linear Regression
Police investigators and forensic scientists can often predict the height of a suspect just from
a footprint that was left behind at a crime scene. This is because the size of a person’s foot
tends to relate to the height of a person. As you might already know, this is a positive
relationship. The taller the person, the larger their feet tends to be.
Page | 141
Unit 4
Activity: Binomial Probability Distribution
Adults are randomly selected with replacement from a group. From that group, a survey
showed that 26% of the adults are smokers.
a) Use Excel to create a probability distribution table and histogram to represent the
experiment of selecting 6 adults. Let 𝑥 represent the number of smokers selected.
Unit 5
Activity: Normal Distribution
Using Excel, create a formula that allows you to input 𝑥, 𝜇, and 𝜎, to calculate the 𝑧 score.
Using the formula you have created, answer the following questions:
b) Referring to the above question, what is the 90th percentile for body temperatures?
Use the 𝑧 score to determine the value of body temperature.
c) If the birth weight of a baby is normally distributed with a mean of 3.4 kg and a
standard deviation of 800 g, then what is the 𝑧 score of a baby with a birth weight of
3.6 kg?
d) Referring to the above question, what is the probability for a baby to have a birth
weight of less than 3.6 kg?
g)
Page | 142
Unit 6
Activity 1: Estimating a Population Parameter
Twelve different computer games that involved violence activities were sampled randomly.
The duration times (in seconds) of violence activities were recorded, with the times listed
below. Use the sample data to construct of 95% confidence interval estimate of the mean
duration time of violence that all computer games of this type would involve.
Unit 6
Activity 2: Hypothesis Testing
Listed below are speeds in km/h measured from southbound traffic on a Toronto city street.
This simple random sample was obtained at 3:30pm on a weekday. Use a 5% significance
level to test the claim that city drivers are driving less than the speed limit of 70 km/h on that
city street.
62 61 61 57 61 54 59 58 59 69 60 67
Page | 143
Answers
Chapter Answers
Page | 144