Topic 6 - Statistics B

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 34

Topic 6: Statistics B

USFP Maths B
Daniel Guo

Green textbook Chapter 20

Contents
• Revision of Statistics A
• Discrete probability distributions
• Continuous probability distributions
• Normal distribution
• 𝑧-scores
• Normal distribution tables https://www.tripsavvy.com/hallgrimskirkja-church-in-reykjavik-iceland-1626130

Last edited: 19/09/2023


Revision of Statistics A
Mean – sum all scores then divide by the number of scores.
Median – the middle score, when sorted.
Mode – the most common score.
Standard deviation – how ‘far’ the scores are from the mean.
To find SD: Mode, 2:STAT, 1:1-VAR, type in scores, AC. Then Shift and 1:Stat, Var, and choose.

Example 1
Find the mean, median and population SD (to 2dp) of the following datasets:
1, 4, 6, 6, 8, 9 1, 4, 6, 6, 8, 9, 25 5, 5, 5, 7, 7, 7

Example 2
Consider the scores 18, 4, 4, 11, 9, 7, 8, 7, 17, 14.
Find the mean, median, mode, and population SD.
Discrete Probability Distributions
When we roll a die, there are six outcomes that can happen. Let’s call 𝑋 the value of the die.
𝒙 𝟏 𝟐 𝟑 𝟒 𝟓 𝟔

𝑃(𝑋 = 𝑥) 1 1 1 1 1 1
6 6 6 6 6 6

This table of outcomes and probabilities is called a probability distribution.


The variable 𝑋 is called a random variable. It has a value that is determined by the outcome of an
experiment. The probability that the random variable 𝑋 is the value 𝑥 is 𝑃 𝑋 = 𝑥 .

There are two types of random variable:


• Discrete random variable- often associated with counting. “How many pets do you have?”
• Continuous random variable- often associated with measuring. “What is your weight?”

Example 3
Four coins are thrown and the number of heads is recorded, denoted by 𝑋.
Complete the probability distribution table and find 𝑃(𝑋 = 3), 𝑃(𝑋 > 2), 𝑃(X is odd).
𝒙 𝟎 𝟏 𝟐 𝟑 𝟒

𝑃(𝑋 = 𝑥)
A probability distribution must satisfy two properties:
• All probabilities are between 0 and 1
• The sum of the probabilities is 1

Example 4
The number of pets, 𝑋, owned by each student in a school is a random variable with the
following discrete probability distribution.
𝒙 𝟎 𝟏 𝟐 𝟑

𝑃(𝑋 = 𝑥) 0.5 0.25 0.2 0.05

a) What is the most common number of pets owned?


b) If two students are selected at random, what is the probability they own the same number
of pets?
c) If two students are selected at random, what is the probability at least one of them owns
exactly 2 pets?
Let’s go back to rolling a die. Let 𝑋 be the random variable that has the value of the die.

𝒙 𝟏 𝟐 𝟑 𝟒 𝟓 𝟔

𝑃(𝑋 = 𝑥) 1 1 1 1 1 1
6 6 6 6 6 6
If you roll a die many times, what would the average value of 𝑋 be? If this was a dataset, it would be called the
mean.
Here it is called expected value 𝐸 𝑋 , which you can basically think of being the ‘mean’ of a probability
distribution.

𝐸 𝑋 = 𝜇 = ෍ 𝑥𝑝

Example 5
The number of pets, 𝑋, owned by each student in a school is a random variable with the following discrete
probability distribution. What is 𝐸(𝑋)?
𝒙 𝟎 𝟏 𝟐 𝟑

𝑃(𝑋 = 𝑥) 0.5 0.25 0.2 0.05

Example 6
Consider the following discrete probability distribution. 𝒙 𝟏 𝟐 𝟑 𝟒 𝟓
a) Find the value of 𝑎.
b) Find the expected value. 𝑃(𝑋 = 𝑥) 𝑎 2𝑎 3𝑎 4𝑎 5𝑎
The variance of a probability distribution Var(𝑋) has the formula

Var 𝑋 = 𝜎 2 = 𝐸 𝑋 2 − 𝜇2 = ෍ 𝑥 2 𝑝 − 𝜇 2

It is important as it gives us standard deviation.

Example 7
You roll a die. Show that the expected value is 3.5 and find the variance.
𝒙 𝟏 𝟐 𝟑 𝟒 𝟓 𝟔

𝑃(𝑋 = 𝑥) 1 1 1 1 1 1
6 6 6 6 6 6

Example 8
Find 𝐸 𝑋 , Var(𝑋) and the standard deviation of the below probability distribution.
𝒙 𝟎 𝟏 𝟐 𝟑 𝟒

𝑃(𝑋 = 𝑥) 0.0 0.1 0.2 0.3 0.4


Continuous Probability Distributions
The examples we have dealt with so far are
discrete random variables and probability
distributions, like rolling a die.
However, it is more common to have a
continuous random variable – for example,
the height of trees, or the results from an
exam.
If we gradually decrease the class range of a histogram for a continuous random variable, the
probability distribution will look like a curve, and is called the probability density function 𝑓𝑋 (𝑥)
or simply 𝑓(𝑥).

It has properties:
• 𝑓 𝑥 ≥0
𝑏
• ‫ = 𝑥𝑑 𝑥 𝑓 𝑎׬‬1
The probability that the value 𝑋 is between 𝑐 and 𝑑 will be given by
𝑑
𝑃 𝑐 ≤ 𝑋 ≤ 𝑑 = න 𝑓 𝑥 𝑑𝑥
𝑐
To summarize:

𝑿 is the value of rolling a die 𝑿 is time waiting in a queue

Discrete probability distribution Continuous probability distribution


𝒙 𝟏 𝟐 𝟑 𝟒 𝟓 𝟔
𝑓 𝑥 = 0.05𝑒 −0.05𝑥
𝑃(𝑋 = 𝑥) 1 1 1 1 1 1
6 6 6 6 6 6
• None of the probabilities are negative • 𝑓 𝑥 ≥0
𝑏
• The sum of the probabilities is 1 • ‫ = 𝑥𝑑 𝑥 𝑓 𝑎׬‬1

𝑃 𝑋 = 𝑥 = a number between 0 and 1 𝑃 𝑋=𝑥 =0


𝑑
𝑃 𝑐 ≤ 𝑋 ≤ 𝑑 = න 𝑓 𝑥 𝑑𝑥
𝑐
Example 9
𝑥2
Consider the probability density function 𝑓 𝑥 = 9
for 0 ≤ 𝑥 ≤ 3.
a) Sketch 𝑓(𝑥).
b) Check that 𝑓(𝑥) is a valid PDF.
c) Find 𝑃 1 ≤ 𝑋 ≤ 2 .

Example 10
A continuous random variable 𝑋 has a probability density function given by
𝑎𝑥 5 − 𝑥 , 0≤𝑥≤5
𝑓 𝑥 =ቊ
0, 𝑥 < 0 or 𝑥 > 5
a) Find 𝑎 (a positive constant).
b) Express 𝑃(1 ≤ 𝑋 ≤ 4) as a definite integral.
c) Express 𝑃(𝑋 < 3) as a definite integral.
The cumulative distribution function (CDF) is 𝐹(𝑥) and gives us the area under the graph.
𝑥
𝑃 𝑋 ≤ 𝑥 = 𝐹 𝑥 = න 𝑓 𝑥 𝑑𝑥
𝑎

Example 11
The PDF of a random variable 𝑋 is
𝑥+1
, 0≤𝑥≤4
𝑓 𝑥 = 12

0, otherwise
a) Find the CDF of 𝑋.
5
b) Find the value of 𝑏 such that 𝑃 𝑋 ≤ 𝑏 = 8.
Example 12
A bid made at an auction for a property, in millions of dollars, can be modelled
by the random variable 𝑋 with the probability density function
𝑘 16 − 𝑥 2 , 1≤𝑥≤4
𝑓 𝑥 =ቊ
0, otherwise
1
a) Show that 𝑘 is .
27
b) Find the CDF.
c) Find the probability that a bid of more than 3 million dollars will be made.
Exercise
Question 1
Let 𝑋 be the time spent waiting for a train which comes every fifteen minutes. The probability density
1
function of 𝑋 is 𝑓 𝑥 = , where 0 ≤ 𝑥 ≤ 15.
15
a) Find the CDF.
b) Find the probability you wait between 5 and 10 minutes for the train.

Question 2
Consider the probability density function 𝑓 𝑥 = 𝑘𝑥, where 0 ≤ 𝑥 ≤ 10.
a) Find the value of 𝑘 to make a valid PDF.
b) Find the CDF.
c) Find 𝑃(𝑋 ≤ 4).
d) Find 𝑃(𝑋 ≥ 4).
Sometimes, a probability distribution function will have an unbounded domain, and we will have an
improper integral. Basically, you are allowed to integrate to infinity.
∞ 𝑎
න 𝑓 𝑥 𝑑𝑥 = lim න 𝑓 𝑥 𝑑𝑥 = 1
−∞ 𝑎→∞ −𝑎

Example 13
1
Consider the probability distribution function 𝑓 𝑥 = for 1 < 𝑥 < 𝑎.
𝑥2
a) Sketch the curve.
b) Find the area under the curve, and take the limit as 𝑎 → ∞, showing that the function is a valid PDF.
You can be asked to find the mean, median, and mode of a continuous PDF.
Mode – the most common value, i.e. the max stationary point.
Median – the ‘middle’ value, i.e. where the CDF 𝐹 𝑥 = 𝑃 𝑋 ≤ 𝑥 = 0.5

Example 14
The diagram on the right shows a continuous PDF.
What is the mode?

Example 15
A probability density function is given by

1 3
𝑓 𝑥 = ቐ12 8𝑥 − 𝑥 , 0≤𝑥≤2
0, otherwise
Find the median of this function.
The mean or expected value of a continuous random variable has the formula

𝐸 𝑋 = 𝜇 = න𝑥𝑓 𝑥 𝑑𝑥

The variance of a continuous random variable is


2
Var 𝑋 = 𝜎 2 = 𝐸 𝑋 2 − 𝐸 𝑋 = න𝑥 2 𝑓 𝑥 𝑑𝑥 − 𝜇2

Example 16
The time 𝑋 hours to deliver a pizza from when it is ordered is a continuous random variable with probability
density function given by

4 2
𝑓 𝑥 = ቐ3 − 3 𝑥, 0<𝑥<1
0, otherwise
a) What is the probability of a pizza being delivered within half an hour of being ordered?
b) Calculate the mean delivery time to the nearest minute.
c) Calculate the standard deviation of the delivery time to the nearest minute.
Example 16.5
The random variable 𝑌 has the probability density function given by
𝑐
, 10 < 𝑦 < 100
𝑓 𝑦 = ቐ𝑦
0, otherwise
a) Find 𝑐.
b) Find the median of 𝑌.
c) Find 𝐸(𝑌).
d) Find Var(𝑌)
e) Find the standard deviation of 𝑌.
Example 17 (2015 VCE Maths Methods Paper 2)
The function 𝑓 is a probability density function with rule
−𝑎𝑒 𝑥 , 0≤𝑥≤1
𝑓 𝑥 = ቐ 𝑎𝑒, 1≤𝑥≤2
0, otherwise

What is the value of 𝑎?

Example 18
The probability density function for the continuous random variable 𝑋 is given by
1−𝑥 , 0≤𝑥≤2
𝑓 𝑥 =ቊ
0, otherwise
What is the probability that 𝑋 < 1.5?
Summary so far:
• Show PDF is valid
• Find probability from a PDF
• Find CDF from PDF
• Find mean/expected value, median, mode of PDF
• Find variance and standard deviation of PDF

Summary example
The random variable 𝑋 has the probability density function 𝑓 𝑥 = 𝑎𝑥, where 0 < 𝑥 < 2.
a) Find the value of 𝑎.
b) Find 𝑃(0 < 𝑋 < 1.5).
c) Find 𝑃(𝑋 > 1.5).
d) Find the cumulative distribution function of 𝑋.
e) Find the median of 𝑋.
f) Find 𝐸(𝑋).
g) Find Var(𝑋).
h) Find the standard deviation of 𝑋.
Normal Distribution
Many random variables 𝑋 are ‘normally distributed’, such as your heights, or your marks.
This is a natural phenomenon – when you graph the PDF of 𝑋, it looks like a bell curve.
If 𝑋 is normally distributed, it has the same mean, median and mode, and is symmetrical around the
mean. We write 𝑋~𝑁(𝜇, 𝜎 2 ). The standard normal distribution is 𝑍~𝑁(0,1).

The normal distribution satisfies the properties of a probability distribution function:


• 𝑓 𝑥 ≥ 0 for all 𝑥

• ‫׬‬−∞ 𝑓 𝑥 𝑑𝑥 = 1

The area under the normal distribution will give the probability of scoring between two values.
Unfortunately, it is not possible to integrate the equation of the normal distribution and find area.
1 −1 𝑥−𝜇 2
𝑓 𝑥 = 𝑒2 𝜎
𝜎 2𝜋
Since we cannot integrate the normal distribution curve, we
first approximate common values.

In a normal distribution, approximately:


• 68% of data lies within one SD of the mean 𝜇±𝜎
• 95% of data lies within two SD of the mean 𝜇 ± 2𝜎
• 99.7% of data lies within three SD of the mean 𝜇 ± 3𝜎

Example 19
The weight of biscuits in a box is normally distributed. The
mean is 252.5g and the SD is 1.7g.
Fill in the sentences below.
a) 50% of the weights lie above___________________
b) 68% of the weights lie between ________ and _________
c) 95% of the weights lie between ________ and _________
d) 99.7% of the weights lie between ________ and _______
Example 20
The delivery times of a pizza shop follow a normal distribution, where the mean is 25 minutes and the
standard deviation is 5 minutes.
a) What percentage of deliveries are between 15 and 35 minutes?
b) What percentage of deliveries are greater than 30 minutes?
c) In 2 months, 4000 pizzas are delivered. How many of those pizzas are delivered in less than 10
minutes?

Example 21
𝑋 is a normally distributed random variable with a mean of 72 and a SD of 8.
Find the probability that:
a) 𝑋 is greater than 80
b) 64 < 𝑋 < 72
c) 𝑋 < 64 given that 𝑋 < 72

As mentioned before, it is not possible to integrate the normal distribution curve to find area under the
curve. So how can we calculate the probabilities of a normal distribution?
𝑧-score
The 𝑧-score is the number of standard deviations a score
is from the mean. It is also called standardised score, as
it allows us to compare scores from different datasets.
Basically, we are transforming 𝑋~𝑁(𝜇, 𝜎 2 ) into
𝑍~𝑁(0,1). This is called standardization.
𝑥−𝜇
𝑧=
𝜎

For example, this is used to compare your results in


different subjects (what we call ‘scaling’).

Example 22
Megan got 63 in a class test. The mean was 84 and the
standard deviation was 7.
Find her 𝑧-score and interpret it.
Example 23 Standard Jarrod’s
Topic Mean
The table shows Jarrod’s exam results. Deviation Result
a) What is Jarrod’s 𝑧-score for English? English 63 9 81
Science 57 16 81
b) Explain the 𝑧-score.
c) What Science mark is equivalent to the English mark?

Example 24
Let 𝑋 be a normally distributed random variable with mean 5 and SD 3, and let 𝑍 be the
standard normal variable.
a) Find 𝑃 𝑋 > 5 .
b) Find 𝑏 such that 𝑃 𝑋 > 7 = 𝑃 𝑍 < 𝑏 .
Normal Distribution Tables
In the last section, the probability that 𝑋 had values
between 𝑐 and 𝑑 was given by
𝑑
𝑃 𝑐 ≤ 𝑋 ≤ 𝑑 = න 𝑓 𝑥 𝑑𝑥
𝑐
However, it is not possible to integrate the PDF 𝑓(𝑥) of a
normally distributed variable 𝑋.
So we have to resort to tables, calculated by computer.

Example 25
The time (in minutes) it takes students to complete an
exam is normally distributed, with a mean of 60 and SD
9. What is the probability of a student taking between 60
to 70 minutes?

We write 𝑃 60 < 𝑋 < 70 , and look at a 𝑧-score table.


Example 26
𝑍 is a standardised normal random variable.
Find the following using a table of 𝑧-scores.
𝑃 𝑍 > 1.47
𝑃(−2.25 < 𝑍 < 1.6)
𝑃(0.12 < 𝑍 < 1.34)
End of content
Revision from now until last lesson

Final exam, 1 hour and 15 minutes, 36 marks total


12 multiple choice questions (1 mark each)
3 written questions (8 marks each)

Topics:
• Trigonometry B
• Exponentials and Logarithms
• Sequences and Series + Financial
• Applications of Differentiation
• Probability (Combinatorics)
• Statistics B

You might also like