Professional Documents
Culture Documents
06a - The Normal Distribution
06a - The Normal Distribution
Mathematical
Methods
Dr. Asim Khwaja
Lecture 06a
The Mean and the SD
• By now we have seen how the average and SD works together
to give us a very good idea of where a distribution is situated
• We stand at the average and walk a few SDs on either side and
we pick up a bulk of the distribution no matter what the shape of
the histogram
• So, we’ll see when to use the bell curve as an approximation for
data, how to use it as an approximation for data and when not
to use it
Shapes of Distributions
• We have seen a number of distributions and they all have
varied shapes – very skewed, skewed, and symmetrical
Density Curves
• A density curve is a graphical and mathematical
representation/model of a quantitative variable where the
outcomes are continuous.
• It shows probability.
• The area under a density curve and above any range of values
is the relative frequency of all observations that fall in that
range.
6
Density Curves
• A smooth density curve is an idealization that gives the overall pattern of
the data but ignores minor irregularities.
• These statements are useful when the idealized world is similar to the real
world.
7
Density Curves
• Why is the area under the curve = 1?
• It isn’t a curve, but the principle is the same: the total area
under the bars = 1.
9
Density Curves
• It gives us a good idea of the shape of a distribution.
10
Density Curves
Normal Distribution (Bell Curve)
• One of the density curves is the Bell Curve.
• It’s known by different names – the Normal distribution, the bell curve, and the
Gaussian curve
• Indicates that extremely small or extremely large values are rare and the bulk of
the values lie around the mean
The Quincunx Demo
• https://www.mathsisfun.com/data/quincunx.html
The Normal Distribution
• Many variables follow this kind of distribution
• So, we’re going to start out with the examination of the bell-
shaped curve that underlies all such bell-shaped curves
• The interesting thing about this curve is that the integral of this curve
cannot be expressed in closed form (i.e. in the form of a formula)
from point a to point b in terms of the usual standard functions (like
the polynomials, exponentials etc.)
The Standard Normal Curve
• Well we still need the areas under portions of this curve
• So we approximate!
• And this practice prevails over the entire field of statistics and
probability – what we don’t know, we approximate!
Useful to Remember
Approximations
• Central areas (since the curve is symmetric):
• Between z = -1 and z = 1: about 68%
• Between z = -2 and z = 2: about 95%
• Tail areas:
• To the left of -1: about 16%
• To the right of 1: about 16%
• To the left of -2: about 2.5%
• To the right of 2: about 2.5%
• Percentiles:
• 95th percentile: z = 1.65, roughly
• 5th percentile: z = -1.65, roughly
The General Form of the Normal
Curve Family
The General Form of the Normal
Curve
• The normal curve with mean μ and the SD σ
Benefit of the Standard Scale z
• There is a family of normal curves not one differentiated by their
mean and SD
• There doesn’t exist a formula for the area under the normal curve
that we plug in two values, and it calculates for us the area under the
curve between those two points
The test must have been really hard, so the Prof decides to standardize all the
scores and only fail people 1 SD below the mean.
The Mean is 23 and the SD is 6.6, and these are the standardized scores:
-0.45. , -1.21, 0.45, 1.36, -0.76, 0.76, 1.82, -1.36, 0.45, -0.15, -0.91
Only 2 students will fail (the ones who scored 15 and 14 on the test)
Another Application of the
Normal Distribution
Your company packages sugar in 1 kg bags. When you weigh a
sample of bags you get these results:
- 1007g, 1032g, 1002g, 983g, 1004g, … (a hundred measurements)
- Mean = 1010g
- SD = 20g
Some values are less than 1000g – how can you fix that?
The normal distribution of your measurements looks like this:
31% of the bags are less than 1000g, which is cheating the
customer!
It is a random thing, so we can’t stop bags having less than
1000g, but we can try to reduce it a lot.
How?
The first one may be cheaper in the early stages but causes
more losses to you in the long run as you are giving away more
sugar for free.
10g / 2.5 = 4 g
36