H2 Maths Normal Distribution

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Normal Distribution

In this unit, you will learn about one of the most important distributions in statistics,

the normal (or Gaussian) distribution. It is used in many calculations in statistics.

The normal random variable is an example of a continuous random variable. It can

take a range of values, with no breaks in between. Contrast this with discrete random

variables such as that for the binomial distribution.

A continuous random variable is defined by a probability density function.

1 Definition

Let X be a random variable that is normally distributed with mean 𝜇 and

variance 𝜎 2 . The probability density function of X is defined as:


1 (𝑥−𝜇)2

f(𝑥) = e 2𝜎2 , −∞ <𝑥<∞
𝜎√2𝜋
You can write this as: 𝑋~N(𝜇, 𝜎 2 ).

2 Properties of the normal distribution

(i) The probability density function is always positive, i.e. f(x) > 0 for all x.

(ii) ∫−∞ f(𝑥) d𝑥 = 1

(iii) P(X = a) = 0
𝑏
(iv) P(𝑎 ≤ 𝑥 ≤ 𝑏) = ∫𝑎 f(𝑥) d𝑥, which is the area bounded by the graph

of y = f(x) and the lines x = a and x = b.

(v) The graph of y = f(x) is the familiar bell-shape which is symmetrical

about the line 𝑥 = 𝜇.

(vi) The x-axis is the horizontal asymptote of the function.

1
(vii) The standard deviation is denoted by 𝜎. It measures the spread of the

normal distribution.

(viii) About 68% of the values in a normal distribution are within one

standard deviation of the mean, i.e. P(𝜇 − 𝜌 < 𝑋 < 𝜇 + 𝜎) = 0.68 a

pproximately.

(ix) About 95% of the values in a normal distribution are within two

standard deviations of the mean, i.e. P(𝜇 − 2𝜌 < 𝑋 < 𝜇 + 2𝜎) = 0.95

approximately.

(x) About 99% of the values in a normal distribution are within three

standard deviations of the mean, i.e. P(𝜇 − 3𝜌 < 𝑋 < 𝜇 + 3𝜎) = 0.99

approximately.

3 Calculations for the Normal Distribution using a GC

The values of P(𝑎 ≤ 𝑥 ≤ 𝑏) can be calculated using the normalcdf( function in

your GC. It can be found in 2nd Vars [Distr].

The second screen shows the default values for normalcdf(.

Example 1

Let X ~ N(10, 4). Calculate the following probabilities.

(i) P(8 < 𝑋 < 12) (ii) P(X < 5.5) (iii) P(𝑋 ≥ 14.5)

2
Solutions

All of these calculations can be done with a GC.

(i) Calculate P(8 < X < 12) using the values shown below.

Thus, P(8 < X < 12) = 0.683.

(ii) In order to calculate P(X < 5.5), you will enter – E99 as the value for lower.

Hence, P(X < 5.5) = 0.0122.

(iii) Note that P(𝑋 ≥ 14.5) = P(𝑋 > 14.5) since P(X = 14.5) = 0.

You don’t need to change P(X > 14.5). Just enter 14.5 as the lower value and

E99 as the upper value.

The answer is P(X > 14.5) = 0.0122.

Note that this answer is the same as P(X < 5.5). This is because of the

symmetrical properties of the normal distribution. In this case, the curve

would be symmetrical about the mean of 10.

3
P(X < 5.5) P(X > 14.5)

4 Standard Normal Distribution

Sometimes, you may encounter questions where either the mean or the

variance, or both, are unknown. In order to find these values, you need to

form equations with them. This can be done by changing the given normal

distribution into the standard normal distribution.

The standard normal distribution has a mean of 0 and a variance of 1 and is

denoted as Z ~ N(0, 1). In statistics, the letter Z is reserved for the random

variable with a standard normal distribution.

Given 𝑋~N(𝜇, 𝜎 2 ), how can you change X into the standard normal

distribution Z ~ N(0, 1)?

The relationship between X and Z is given by:


𝑋−𝜇
𝑍=
𝜎
5 Inverse Normal

Sometimes you need to find the value of x such that P(X < x) = k,

where 0 < k < 1.

In such cases, you can use the 3:invNorm( function, found in 2nd Vars [Distr].

Note that the value of x can only be found using this function when the

probability is written as P(X < x).

4
6 Linear Combinations

Suppose X and Y are independent random variables. The means and variances

of linear combinations (sum and/or difference) of X and Y can be calculated as

follows.

(i) E(𝑎𝑋 ± 𝑏𝑌) = 𝑎E(𝑋) ± 𝑏E(𝑌)

(ii) Var(𝑎𝑋 ± 𝑏𝑌) = 𝑎2 Var(𝑋) + 𝑏 2 Var(𝑌)

Thus, if 𝑋~N(𝜇1 , 𝜎12 ) and 𝑌~N(𝜇2 , 𝜎22 ), then you have the following:
E(𝑎𝑋 ± 𝑏𝑌) = 𝑎𝜇1 ± 𝑏𝜇2
Var(𝑎𝑋 ± 𝑏𝑌) = 𝑎2 𝜎12 + 𝑏 2 𝜎22
which means that
𝑎𝑋 ± 𝑏𝑌~N(𝑎𝜇1 ± 𝑏𝜇2 , 𝑎2 𝜎12 + 𝑏 2 𝜎22 )

Note that E(𝑎𝑋 ± 𝑏𝑌) could be negative but Var(𝑎𝑋 ± 𝑏𝑌) will never be

negative.

Example 2

Eric loves to solve online Sudoku puzzles. The puzzles are classified as “Beginner”,

“Intermediate” and “Expert”. The times, in minutes, that Eric takes to solve the

puzzles are independent and normally distributed with the means and standard

deviations shown in the following table.

Type Mean (min) Standard Deviation


(min)
Beginner 3.2 0.5
Intermediate 𝜇 2.1
Expert 12.3 𝜎

5
(i) If the probability that the total time taken by Eric to solve a “Beginner” and an

“Intermediate” puzzle in more than 9.5 minutes is 0.8, find the value of 𝜇,

correct to 1 decimal place.

(ii) If the probability that Eric takes less than 10 minutes to solve an “Expert”

puzzle is at most 0.25, find the maximum value of 𝜎, correct to 1 decimal place.

(iii) Suppose 𝜇 = 7.8 and 𝜎 = 3.2. Find the probability that the total time taken by

Eric to solve an “Intermediate” and an “Expert” puzzle is more than four

times the time he spends on a “Beginner” puzzle.

Solutions

(i) Let X be the random variable representing the time in minutes taken by Eric to

solve a “Beginner” puzzle.

Let Y be the random variable representing the time in minutes taken by Eric to

solve an “Intermediate” puzzle.

Thus, 𝑋~N(3.2, 0.52 ) and 𝑌~N(𝜇, 2.12 ).

Let T be the total time, in minutes, taken by Eric to solve a “Beginner” and an

“Intermediate” puzzle.

T=X+Y

E(T) = E(X) + E(Y)


= 3.2 + 𝜇
Var(T) = Var(X) + Var(Y)

= 0.52 + 2.12

= 4.66
∴ 𝑇~N(3.2 + 𝜇, 4.66)
Given: P(T > 9.5) = 0.8

6
Standardize this probability to get the following.
9.5 − (3.2 + 𝜇)
P (𝑍 > ) = 0.8 6.3−𝜇 6.3−𝜇
√4.66 Change P (𝑍 > ) into 1 − P (𝑍 < )
√4.66 √4.66

6.3 − 𝜇 in order to use the invNorm( function.


P (𝑍 > ) = 0.8
√4.66
6.3 − 𝜇
1 − P (𝑍 < ) = 0.8
√4.66
6.3 − 𝜇
P (𝑍 < ) = 0.2
√4.66
Use the invNorm( function to find the value of z for which P(Z < z) = 0.2.

The value of z obtained is 𝑧 = −0.841621.

Hence,
6.3 − 𝜇
= −0.841621
√4.66
6.3 − 𝜇 = −0.841621√4.66
𝜇 = 6.3 − 0.841621√4.66
𝜇 = 4.48319
𝜇 = 4.5 (to 1 d. p. )

(ii) Let A be the random variable representing the time taken, in minutes, to solve

an “Expert” puzzle.
∴ 𝐴~N(12.3, 𝜎 2 )
Given: P(𝐴 < 10) ≤ 0.25

Standardize this probability.


10 − 12.3
P (𝑍 < ) ≤ 0.25
𝜎
Once again, use the invNorm( function to find the value of z for which

7
P(Z < z) = 0.25. This value is 𝑧 = −0.674490.

The inequality can be obtained by observing the standard normal graph.

0.25

𝑧 = −0.674490

10−12.3
Thus, ≤ −0.67449
𝜎

−2.3 ≤ −0.67449𝜎
𝜎 ≤ −3.40998
The maximum value of 𝜎 is −3.4.

(iii) As stated, you have the following values for the normal distributions.

𝑋~N(3.2, 0.52 ), 𝑌~N(7.8, 2.12 ), 𝐴~N(12.3, 3.22 )

Total time taken by Eric to solve an “Intermediate” and an “Expert” puzzle

=Y+A

You want to find P(Y + A > 4X), or equivalently, P(𝑌 + 𝐴 − 4𝑋 > 0).
E(𝑌 + 𝐴 − 4𝑋)
= E(𝑌) + E(𝐴) − 4E(𝑋)
= 7.8 + 12.3 − 4(3.2)
= 7.3
Var(𝑌 + 𝐴 − 4𝑋)
= Var(𝑌) + Var(𝐴) + 42 Var(𝑋)
= 2.12 + 3.22 + 16(0.52 )
= 18.65

8
∴ 𝑌 + 𝐴 − 4𝑋 ~ N(7.3, 18.65)

P(𝑌 + 𝐴 − 4𝑋 > 0) = 0.955

Example 3

A manufacturer produces chocolate bars and candy bars. The masses, in grams, of

the chocolate bars and candy bars are modelled as having independent normal

distributions with means and standard deviations as shown in the table.

Mean mass (g) Standard deviation


(g)
Chocolate bar 25 0.3
Candy bar 32 0.5
(i) Find the probability that the mean mass of 2 randomly chosen chocolate bars

and 3 randomly chosen candy bars is less than 29 grams.

(ii) A quality control was conducted to ensure that the chocolate bars produced

are of similar quality. It was found that for 95% of the time, the difference in

mass between any 2 randomly chosen chocolate bars is less than k grams. Find

the value of k.

The chocolate bars and candy bars are sold by weight at $0.08 per gram and

$0.03 per gram respectively.

(iii) Find the probability that the total selling price of a randomly chosen chocolate

bar and a randomly chosen candy bar is more than $3.

9
Solutions

(i) Let X be the random variable representing the mass of a chocolate bar.

Let Y be the random variable representing the mass of a candy bar.

𝑋~N(25, 0.32 ) and 𝑌~N(32, 0.52 )

Total mass of 2 chocolate bars and 3 candy bars

= X1 + X2 + Y1 + Y2 + Y3

Let M be the mean mass of 2 chocolate bars and 3 candy bars


𝑋1 + 𝑋2 + 𝑌1 + 𝑌2 + 𝑌3
∴ 𝑀=
5
𝑋1 + 𝑋2 + 𝑌1 + 𝑌2 + 𝑌3
E( )
5
1
= E(𝑋1 + 𝑋2 + 𝑌1 + 𝑌2 + 𝑌3 )
5
1
= [E(𝑋1 ) + E(𝑋2 ) + E(𝑌1 ) + E(𝑌2 ) + E(𝑌3 )]
5
1
= [2E(𝑋) + 3E(𝑌)]
5
1
= [2(25) + 3(32)]
5
= 29.2
𝑋1 + 𝑋2 + 𝑌1 + 𝑌2 + 𝑌3
Var ( )
5
1
= 2 Var(𝑋1 + 𝑋2 + 𝑌1 + 𝑌2 + 𝑌3 )
5
1
= [Var(𝑋1 ) + Var(𝑋2 ) + Var(𝑌1 ) + Var(𝑌2 ) + Var(𝑌3 )]
25
1
= [2Var(𝑋) + 3Var(𝑌)]
25
1
= [2(0.32 ) + 3(0.52 )]
25
= 0.0372
Hence, 𝑀~N(29.2, 0.0372).
P(𝑀 < 29) = 0.149879 = 0.150

10
(ii) Let 𝐷 = 𝑋1 − 𝑋2 .
E(𝐷) = E(𝑋1 ) − E(𝑋2 ) = 0
Var(𝐷) = Var(𝑋1 ) + Var(𝑋2 )
= 2(0.32 )
= 0.18
Therefore, 𝐷~N(0, 0.18).

Given: P(|𝐷| < 𝑘) = 0.95


P(−𝑘 < 𝐷 < 𝑘) = 0.95
𝑘 𝐷 𝑘
P (− < < ) = 0.95
√0.18 √0.18 √0.18
𝑘 𝑘
P (− <𝑍< ) = 0.95
√0.18 √0.18
Sketch a graph to represent this probability.

0.95
0.025
0.025

−1.95996 1.95996

Thus,
𝑘
= 1.95996
√0.18
𝑘 = 0.83154 = 0.832

(iii) Let S1 = 0.08X and S2 = 0.03Y.


E(𝑆1 + 𝑆2 )
= 0.08E(𝑋) + 0.03E(𝑌)
= 0.08(25) + 0.03(32)

11
= 2.96
Var(𝑆1 + 𝑆2 )
= 0.082 Var(𝑋) + 0.032 Var(𝑌)
= 0.000801
𝑆1 + 𝑆2 ~N(2.96, 0.000801)
P(𝑆1 + 𝑆2 > 3) = 0.0788

Example 4

Peter bought an ice-cream machine. The time taken by the machine to produce a

large tub of ice-cream follows a normal distribution with mean 𝜇 minutes and

standard deviation 𝜎 minutes. It is found that there is an 88% chance that the

machine will take less than 60 minutes and a 70% chance that it will take more than

50 minutes to produce a large tub of ice-cream. The amount of time taken by the

machine to produce a small tub of ice-cream also follows a normal distribution with

mean 20 minutes and standard deviation 2 minutes. The time taken by the machine

to produce a large tub of ice-cream and a small tub of ice-cream are independent of

each other.

(i) Find 𝜇 and 𝜎.

(ii) Find the probability that the difference between the time taken by the machine

to produce 5 large tubs of ice-cream and thrice the amount of time taken to

produce 3 small tubs of ice-cream is more than 1 hour.

Solutions

(i) Let X be the random variable representing the amount of time taken, in

minutes, to produce a large tub of ice-cream.

Let Y be the random variable representing the amount of time taken, in

minutes, to produce a small tub of ice-cream.

12
Then 𝑋~N(𝜇, 𝜎 2 ) and Y ~ N(20, 22).

Given: P(X < 60) = 0.88


60 − 𝜇
P (𝑍 < ) = 0.88
𝜎
60 − 𝜇
∴ = 1.17499
𝜎
60 − 𝜇 = 1.17499𝜎
1.17499𝜎 + 𝜇 = 60 (1)
P(𝑋 > 50) = 0.7
1 − P(𝑋 < 50) = 0.7
P(𝑋 < 50) = 0.3
50 − 𝜇
P (𝑍 < ) = 0.3
𝜎
50 − 𝜇
∴ = −0.524401
𝜎
50 − 𝜇 = −0.524401𝜎
−0.524401𝜎 + 𝜇 = 50 (2)
Solve (1) and (2) simultaneously,

𝜎 = 5.8845 and 𝜇 = 53.0858.

Thus, 𝜇 = 53.1 and 𝜎 = 5.88.

(ii) Let 𝑇 = 𝑋1 + 𝑋2 + ⋯ + 𝑋5 − 3(𝑌1 + 𝑌2 + 𝑌3 ).


E(𝑇) = 5E(𝑋) − 3E(𝑌)
= 5(53.0858) − 3(3 × 20)
= 85.429
Var(𝑇) = 5Var(𝑋) + 32 [3Var(𝑌)]
= 5(5.88452 ) + 9(12)
= 281.137
∴ 𝑇~N(85.429, 281.137)
𝑃(𝑇 > 60) = 0.935

13
Problem Solving Techniques

1. The random variable has to be defined clearly and explicitly for any question

involving the binomial distribution. If the parameters 𝜇 or 𝜎 changes, a new

random variable has to be defined with a new uppercase letter.

2. Be aware of the units used in the question and define the random variables

with the correct units.

3. The mean and variance of the normal distribution has to be stated in the

question or given indirectly as probabilities. Thus, it would be quite obvious

when you are dealing with a question involving the normal distribution.

4. If the mean and/or the variance is unknown, you need to standardize the

given probabilities so that you can form equations to solve for the unknowns.

5. Read the question carefully to decide if you need to add or subtract some

linear combinations of the random variables.

14

You might also like