Download as pdf or txt
Download as pdf or txt
You are on page 1of 36

MATH 621

Dr. Asim Khwaja
Lecture 06a
The Mean and the SD
• By now we have seen how the average and SD works together
to give us a very good idea of where a distribution is situated

• We stand at the average and walk a few SDs on either side and
we pick up a bulk of the distribution no matter what the shape of
the histogram

• And that is a very useful thing because we really get a sense

from these two numbers of where our data is located
The Bell Curve (Normal Curve)
• But we would notice that while we can roughly see where the median
is located (the halfway point of the data), where the mean is located
(the point where the distribution balances itself on a fulcrum), it is not
so easy to see where the SD is

• What we are going to do is introduce the great bell curve – an

approximation to many distributions of data and many distributions of

• And this is one distribution where we can see the SD instantly as

well as locate the center almost exactly
The Bell Curve (Normal Curve)
• And once we have a bell curve with the mean and the SD, that
is all we need to calculate all proportions under this curve

• So, we’ll see when to use the bell curve as an approximation for
data, how to use it as an approximation for data and when not
to use it
Shapes of Distributions
• We have seen a number of distributions and they all have
varied shapes – very skewed, skewed, and symmetrical
Density Curves
• A density curve is a graphical and mathematical
representation/model of a quantitative variable where the
outcomes are continuous.

• It shows probability.

• The area under the curve is = 1.

• The area under a density curve and above any range of values
is the relative frequency of all observations that fall in that

• The above density curve is a graph of how body weights are


Density Curves
• A smooth density curve is an idealization that gives the overall pattern of
the data but ignores minor irregularities.

• Mathematical models are idealized descriptions.

• They allow us to easily make many statements in an idealized world.

• These statements are useful when the idealized world is similar to the real

• We first discuss density curves in general and then focus on a special

class of density curves, the bell-shaped Normal curves.

Density Curves
• Why is the area under the curve = 1?

• Consider the uniform distribution (right).

• It isn’t a curve, but the principle is the same: the total area
under the bars = 1.

• It denotes the probability of a coin toss coming heads or tails.

(0.5 x 1 + 0.5 x 1) = 1
Density Curves
• If we add more bars to the graph
(we get a histogram), we get
something that’s starting to look
like a curve.

• If we add up all of the areas of

these rectangles, they will equal

Density Curves
• It gives us a good idea of the shape of a distribution.

• It shows us whether or not the distribution has one or more

peaks of frequently occurring values.

• It shows us whether or not the distribution is skewed to the left

or right.

Density Curves
Normal Distribution (Bell Curve)
• One of the density curves is the Bell Curve.

• It’s known by different names – the Normal distribution, the bell curve, and the
Gaussian curve

• Many variables closely follow a Normal distribution like:

• Heights of people
• Size of things produced by machines
• Errors in measurements
• Blood pressure
• Marks on a test

• Indicates that extremely small or extremely large values are rare and the bulk of
the values lie around the mean
The Quincunx Demo
The Normal Distribution
• Many variables follow this kind of distribution

• But not all – many variables don’t

• In fact, more variables don’t than do

The Normal Distribution
• However, for those that do, it is possible to use that bell to
quickly get approximations to areas in the histograms

• So, we’re going to start out with the examination of the bell-
shaped curve that underlies all such bell-shaped curves

• It is the parent of them all

• And it is called the standard normal curve

The Standard Normal Curve
The Standard Normal Curve
• Now we will be interested in finding out what percentage of data
values fall within any given range (e.g. if we are talking about height
distribution of people of Karachi, then what %age of people fall
between the heights of 4.5 feet to 5.5 feet)

• That requires finding areas under that portion of the curve

• The interesting thing about this curve is that the integral of this curve
cannot be expressed in closed form (i.e. in the form of a formula)
from point a to point b in terms of the usual standard functions (like
the polynomials, exponentials etc.)
The Standard Normal Curve
• Well we still need the areas under portions of this curve

• So we approximate!

• And this practice prevails over the entire field of statistics and
probability – what we don’t know, we approximate!
Useful to Remember
• Central areas (since the curve is symmetric):
• Between z = -1 and z = 1: about 68%
• Between z = -2 and z = 2: about 95%

• Tail areas:
• To the left of -1: about 16%
• To the right of 1: about 16%
• To the left of -2: about 2.5%
• To the right of 2: about 2.5%

• Percentiles:
• 95th percentile: z = 1.65, roughly
• 5th percentile: z = -1.65, roughly
The General Form of the Normal
Curve Family
The General Form of the Normal
• The normal curve with mean μ and the SD σ
Benefit of the Standard Scale z
• There is a family of normal curves not one differentiated by their
mean and SD

• There doesn’t exist a formula for the area under the normal curve
that we plug in two values, and it calculates for us the area under the
curve between those two points

• We need to approximate the area through some means (like taylor


• Also, we need the z scale to compare different normal curves

The 68-95-99.7 Rule
E.g.: Using z-scale for
• A person takes test A and scores 84 on it. The scores for this test are
normally distributed with a mean of 80 and an SD of 8. That same
person also took test B and scores 28 on it. The scores of test B are
also normally distributed about a mean of 20 with an SD of 6.
Compare the performance of this person on the two tests

The z score of the person in test A is:

z = (x – μ) / σ = (84 – 80) / 8 = 0.5

The z score of the person in test B is:

z = (x – μ) / σ = (28 – 20) / 6 = 1.33
Z-scale – It Makes Decisions Easy
– Relative Grading
A professor is marking a test. Here are the students’ results (out of 60 points):
20, 15, 26, 32, 18, 28, 35, 14, 26, 22, 17
Most students didn’t even get 30 out of 60, and most will fail.

The test must have been really hard, so the Prof decides to standardize all the
scores and only fail people 1 SD below the mean.

The Mean is 23 and the SD is 6.6, and these are the standardized scores:

-0.45. , -1.21, 0.45, 1.36, -0.76, 0.76, 1.82, -1.36, 0.45, -0.15, -0.91

Only 2 students will fail (the ones who scored 15 and 14 on the test)
Another Application of the
Normal Distribution
Your company packages sugar in 1 kg bags. When you weigh a
sample of bags you get these results:
- 1007g, 1032g, 1002g, 983g, 1004g, … (a hundred measurements)
- Mean = 1010g
- SD = 20g

Some values are less than 1000g – how can you fix that?
The normal distribution of your measurements looks like this:

31% of the bags are less than 1000g, which is cheating the
It is a random thing, so we can’t stop bags having less than
1000g, but we can try to reduce it a lot.


By adjusting the machine.

At the moment, 1000g is -0.5 SD below the mean causing 31% to

be underweight.

If we adjust the machine so that 1000g is at -2.5 SD below mean,

that will cause only 0.6% of the lot to be underweight.
Now, we can adjust in two ways:
- increase the amount of sugar in each bag (which changes the mean),
- Make it more accurate (which changes the standard deviation,
reducing it)

The first one may be cheaper in the early stages but causes
more losses to you in the long run as you are giving away more
sugar for free.

The second one may require buying a more accurate machine,

so more costly early on but cheaper in the long run.
• Adjust the mean
amount in each bag

The SD is 20g (accuracy

of the machine), and we
need 2.5 of them:
2.5 x 20g = 50g

So, the machine should

average 1050g, like this:
• Adjust the accuracy of the machine

Or we can keep the same mean of 1010g, but

then we need 2.5 SD to be equal to 10g:

10g / 2.5 = 4 g

So, the SD should be 4g, rather than 20g

like this:

Or perhaps we could have some combination

of both better accuracy and slightly larger
average size.
Comparing the Two Distributions
• Show PDFs for two different types of z-tables
The End


You might also like